0% found this document useful (0 votes)

29 views16 pages

Inferential Statistics and Data Analytics Guide

Unit concepts

Uploaded by

wipet19377

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views16 pages

Inferential Statistics and Data Analytics Guide

Unit concepts

Uploaded by

wipet19377

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

UNIT – 3

INFERENTIAL STATISTICS

Populations – samples – random sampling – Sampling distribution- standard error of the mean - Hypothesis testing –
z-test – z-test procedure –decision rule – calculations – decisions – interpretations - one-tailed and two-tailed tests –
Estimation – point estimate – confidence interval – level of confidence – effect of sample size.

Data Analytics:

● Analytics is defined as “the scientific process of transforming data into insights for making better
decisions”
● Analytics, is the use of data, information technology, statistical analysis, quantitative methods, and
mathematical or computer-based models to help managers gain improved insight about their business
operations and make better, fact-based decisions – James Evans

Why analytics is important?

Opportunity abounds for the use of analytics and big data such as:

1. Determining credit risk

2. Developing new medicines
3. Finding more efficient ways to deliver products and services
4. Preventing fraud
5. Uncovering cyber threats
6. Retaining the most valuable customers
Data analysis:
● Data analysis is the process of examining, transforming, and arranging raw data in a specific way to
generate useful information from it.
● Data analysis allows for the evaluation of data through analytical and logical reasoning to lead to some sort
of outcome or conclusion in some context.
● Data analysis is a multi-faceted process that involves a number of steps, approaches, and diverse
techniques.

Difference Between Data analytics and data analysis:

Data Analytics is the process of exploring the data from the past to make appropriate decisions in the future
by using valuable insights. Whereas Data Analysis helps in understanding the data and provides required insights
from the past to understand what happened so far.

S.No Data Analytics Data Analysis

1. It is described as a traditional form or It is described as a particularized
generic form of analytics. form of analytics.
2. It includes several stages like the To process data, firstly raw data is
collection of data and then the defined in a meaningful manner, then
inspection of business data is done. data cleaning and conversion are
done to get meaningful information
from raw data
3. It supports decision making by analyzing It analyzes the data by focusing on
enterprise data. insights into business data.
4. It uses various tools to process data such It uses different tools to analyze data
as Tableau, Python, Excel, etc. such as Rapid Miner, Open Refine,
Node XL, KNIME, etc.
5. Descriptive analysis cannot be A Descriptive analysis can be
performed on this. performed on this.
6. One can find anonymous relations with One cannot find anonymous relations
the help of this. with the help of this.
Classification of Data analytics:

Based on the phase of workflow and the kind of analysis required, there are four major types of data analytics.
• Descriptive analytics
• Diagnostic analytics
• Predictive analytics
• Prescriptive analytics

1. Descriptive Analytics:
● Descriptive Analytics, is the conventional form of Business Intelligence and data analysis.
● It seeks to provide a depiction or “summary view” of facts and figures in an understandable format.
● This either inform or prepare data for further analysis.
● Descriptive analysis or statistics can summarize raw data and convert it into a form that can be easily
understood by humans.
● They can describe in detail about an event that has occurred in the past.
2. Diagnostic analytics:
● Diagnostic Analytics is a form of advanced analytics which examines data or content to answer the
question “Why did it happen?”.
● Diagnostic analytical tools aid an analyst to dig deeper into an issue so that they can arrive at the source of
a problem.
● In a structured business environment, tools for both descriptive and diagnostic analytics go parallel.
3. Predictive analytics:
● Predictive analytics helps to forecast trends based on the current events.
● Predicting the probability of an event happening in future or estimating the accurate time it will happen can
all be determined with the help of predictive analytical models.
● Many different but co-dependent variables are analyzed to predict a trend in this type of analysis.
4. Prescriptive analytics:
● Set of techniques to indicate the best course of action
● It tells what decision to make to optimize the outcome
● The goal of prescriptive analytics is to enable:
1. Quality improvements
2. Service enhancements
3. Cost reductions
4. Increasing productivity
Descriptive Statistics:
Descriptive statistics can be used to summarize and describe a single variable.
• Frequencies (counts) & Percentages
– Use with categorical (nominal) data
• Levels, types, groupings, yes/no, Drug A vs. Drug B
• Means & Standard Deviations
– Use with continuous (interval/ratio) data
• Height, weight, cholesterol, scores on a test

Inferential Statistics:

Inferential statistics can be used to prove or disprove theories, determine associations between variables,
and determine if findings are significant and whether or not we can generalize from our sample to the entire
population. Otherwise Inferential statistics are used to draw conclusions about a population by examining the
sample. Accuracy of inference depends on representativeness of sample from population.
Random selection - equal chance for anyone to be selected makes sample more representative.
1.Populations:
● Any complete set of observations (or potential observations) may be characterized as a population. A
population can also be defined as including all people or items with the characteristic one wishes to
understand.
Real Population:
● A real population is one in which all potential observations are accessible at the time of sampling.
Hypothetical Population:
● A hypothetical population is one in which all potential observations are not accessible at the time of
sampling.
Examples: All likely voters in the next election
All parts produced today
● All sales receipts for November

2.Sample:
Any subset of observations from a population may be characterized as a sample. A sample is “a smaller
(but hopefully representative) collection of units from a population used to determine truths about that population”.
Examples: 1000 voters selected at random for interview
A few parts selected for destructive testing
Random receipts selected for audit

Probability and Statistics Sampling distribution:

Probability:
Probability refers to the proportion or fraction of times that a particular event is likely to occur.
Probability Sampling:
▪ A sampling technique in which every member of the population has a known, nonzero probability
of selection.
Non-Probability Sampling:
▪ A sampling technique in which units of the sample are selected on the basis of personal judgment
or convenience.
▪ The probability of any particular member of the population being chosen is unknown.

3.Random Sampling:
Random sampling is the selection process that guarantees all potential observations in the population have
an equal chance of being included in the sample.
It’s important to note that randomness describes the selection process—that is, the conditions under which
the sample is taken—and not the particular pattern of observations in the sample.
Types and examples of random sampling techniques.
Four main types of random sampling techniques –

● Simple Random Sampling technique – In this technique, a sample is chosen randomly using randomly
generated numbers. A sampling frame with the list of members of a population is required, which is
denoted by ‘n’. Using Excel, one can randomly generate a number for each element that is required.
● Systematic Random Sampling technique -This technique is very common and easy to use in statistics. In
this technique, every k’th element is sampled. For instance, one element is taken from the sample and
then the next while skipping the pre-defined amount or ‘n’. In a sampling frame, divide the size of the
frame N by the sample size (n) to get ‘k’, the index number. Then pick every k’th element to create your
sample.

● Cluster Random Sampling technique -In this technique, the population is divided into clusters or groups
in such a way that each cluster represents the population. After that, you can randomly select clusters to
sample.
● Stratified Random Sampling technique – In this technique, the population is divided into groups that
have similar characteristics. Then a random sample can be taken from each group to ensure that
different segments are represented equally within a population.
Conditional Probability
To obtain the probability that two dependent events occur together, the probability of the second event must
be adjusted to reflect its dependency on the prior occurrence of the first event. This new probability is the
conditional probability of the second event, given the first event.
Also define as, the probability of one event, given the occurrence of another event.

4.SAMPLING DISTRIBUTION:
● Sampling distribution is a statistic that determines the probability of an event based on data from a small
group within a large population.
● Its primary purpose is to establish representative results of small samples of a comparatively larger
population. Since the population is too large to analyze, the smaller group is selected and repeatedly
sampled or analyzed.
● The gathered data, or statistics, is used to calculate the likely occurrence, or probability, of an event.
● Using a sampling distribution simplifies the process of making inferences, or conclusions, about large
amounts of data.

Sampling distribution of the mean:

The sampling distribution of the mean refers to the probability distribution of means for all possible random
samples of a given size from some population.
All Possible Random Samples - All possible random samples refers not to the number of samples of size
‘n’ required to survey completely the local population of an observation but to the number of different ways in
which a single sample of size ‘n’ can be selected from this population.
Mean of all sample means ( µ ¯x ):
The distribution of sample means itself has a mean. The mean of the sampling distribution of the mean
always equals the mean of the population.

where µ¯X represents the mean of the sampling distribution and μ represents the mean of the population.

5.Standard error of the mean ( σ ¯x ):

The distribution of sample means also has a standard deviation, referred to as the standard error of the
mean. The standard error of the mean equals the standard deviation of the population divided by the square root of
the sample size.

Where σ ¯X represents the standard error of the mean; σ represents the standard deviation of the population; and n
represents the sample size.

Shape of the sampling distribution:

A product of statistical theory, expressed in its simplest form, the central limit theorem states that,
regardless of the shape of the population, the shape of the sampling distribution of the mean approximates a normal
curve if the sample size is sufficiently large.

Problem: Imagine a very simple population consisting of only five observations: 18, 20, 22, 24
(a) List all possible samples of size two.
(b) Construct a relative frequency table showing the sampling distribution of the mean.

Solution:
Relative frequency table:

Sample Mean R.F or probability

(18,18) 18 1/16
(18,20) 19 1/16
(18,22) 20 1/16
(18,24) 21 1/16
(20,18) 19 1/16
(20,20) 20 1/16
(20,22) 21 1/16
(20,24) 22 1/16
(22,18) 20 1/16
(22,20) 21 1/16
(22,22) 22 1/16
(22,24) 23 1/16
(24,18) 21 1/16
(24,20) 22 1/16
(24,22) 23 1/16
(24,24) 24 1/16
5. Standrad Error of the Mean:
The standard error of the mean equals the standard deviation of the population divided by the square root of
the sample size. Rough measure of the average amount by which sample means deviate from the mean of the
sampling distribution or from the population mean.

Special Type of Standard Deviation

The standard error of the mean serves as a special type of standard deviation that measures variability in the
sampling distribution. The error in standard error refers not to computational errors, but to errors in generalizations
attributable to the fact that, just by chance, most random samples aren’t exact replicas of the population.
The standard error of the mean as a rough measure of the average amount by which sample means
deviate from the mean of the sampling distribution or from the population mean.
The population standard deviation reflects variability among individual observations, and it is directly
affected by any relatively large or small observations within the population. On the other hand, the standard error of
the mean reflects variability among sample means, each of which represents a collection of individual observations.
The appearance of relatively large or small observations within a particular sample tends to affect the sample mean
only slightly, because of the stabilizing presence in the same sample of other, more moderate observations or even
extreme observations in the opposite direction.

6.Hypothesis testing:
Hypothesis testing is a form of statistical inference that uses data from a sample to draw conclusions about
a population parameter or a population probability distribution. First, a tentative assumption is made about
the parameter or distribution. This assumption is called the null hypothesis and is denoted by H0. An alternative
hypothesis (denoted Ha), which is the opposite of what is stated in the null hypothesis, is then defined. The
hypothesis-testing procedure involves using sample data to determine whether or not H0 can be rejected. If H0 is
rejected, the statistical conclusion is that the alternative hypothesis Ha is true.

● Null Hypothesis: The null hypothesis is a statement that the value of a population parameter (such as
proportion, mean, or standard deviation) is equal to some claimed value. We either reject or fail to reject the null
hypothesis. Null Hypothesis is denoted by H0.
● Alternate Hypothesis: The alternative hypothesis is the statement that the parameter has a value that is different
from the claimed value. It is denoted by HA.

Level of significance: It means the degree of significance in which we accept or reject the null-hypothesis. Since in
most of the experiments 100% accuracy is not possible for accepting or rejecting a hypothesis, so we,
therefore, select a level of significance. It is denoted by alpha (∝).
For example, assume that a radio station selects the music it plays based on the assumption that the average age of
its listening audience is 30 years.
To determine whether this assumption is valid, a hypothesis test could be conducted with the null
hypothesis given as H0: μ = 30 and the alternative hypothesis given as Ha: μ ≠ 30.
Based on a sample of individuals from the listening audience, the sample mean age, x̄, can be computed
and used to determine whether there is sufficient statistical evidence to reject H0.
Conceptually, a value of the sample mean that is “close” to 30 is consistent with the null hypothesis, while
a value of the sample mean that is “not close” to 30 provides support for the alternative hypothesis. What is
considered “close” and “not close” is determined by using the sampling distribution of x̄.

The null hypothesis that the population mean for the freshman class equals 500 is tentatively assumed to be true. It is
tested by determining whether the one observed sample mean qualifies as a common outcome or a rare outcome in
the hypothesized sampling distribution
Common Outcomes
An observed sample mean qualifies as a common outcome if the difference between its value and that of the
hypothesized population mean is small enough to be viewed as a probable outcome under the null hypothesis.
A common outcome signifies a lack of evidence that, with respect to the null hypothesis, something
special is happening in the underlying population
Rare Outcomes
An observed sample mean qualifies as a rare outcome if the difference between its value and the
hypothesized population mean is too large to be reasonably viewed as a probable outcome under the null
hypothesis.
A rare outcome signifies that,with respect to the null hypothesis, something special probably is happening
in the underlying population.
Boundaries for Common and Rare Outcomes
Superimposed on the hypothesized sampling distribution in Figure 10.2 is one possible set of boundaries for
common and rare outcomes, expressed in values of X.
If the one observed sample mean is located between 478 and 522, it will qualify as a common
outcome (readily attributed to variability) under the null hypothesis, and the null hypothesis will be retained. If,
however, the one observed sample mean is greater than 522 or less than 478, it will qualify as a rare outcome (not
readily attributed to variability) under the null hypothesis, and the null hypothesis will be rejected.

7. Z-test:
Z-test is a statistical method to determine whether the distribution of the test statistics can be approximated
by a normal distribution. It is the method to determine whether two sample means are approximately the same or
different when their variance is known and the sample size is large (should be >= 30).
When to Use Z-test:
o The sample size should be greater than 30. Otherwise, we should use the t-test.
o Samples should be drawn at random from the population.
o The standard deviation of the population should be known.
o Samples that are drawn from the population should be independent of each other.
o The data should be normally distributed, however for large sample size, it is assumed to have a
normal distribution.

8. Z-test procedure:
1. First, identify the null and alternate hypotheses.
2. Determine the level of significance (∝).
3. Find the critical value of z in the z-test using
4. Calculate the z-test statistics. Below is the formula for calculating the z-test statistics.

where,
X¯: mean of the sample.
Mu: mean of the population.
Sd: Standard deviation of the population.
n: sample size.
Now compare with the hypothesis and decide whether to reject or not to reject the null hypothesis.
Example:
Type of Z-test
● Left-tailed Test: In this test, our region of rejection is located to the extreme left of the distribution.
Here our null hypothesis is that the claimed value is less than or equal to the mean population value.
● Right-tailed Test: In this test, our region of rejection is located to the extreme right of the distribution.
Here our null hypothesis is that the claimed value is less than or equal to the mean population value.
● Two-tailed test: In this test, our region of rejection is located to both extremes of the distribution. Here
our null hypothesis is that the claimed value is equal to the mean population value.

Problem: A school claimed that the students’ study that is more intelligent than the average school. On calculating
the IQ scores of 50 students, the average turns out to be 110. The mean of the population IQ is 100 and the standard
deviation is 15. State whether the claim of principal is right or not at a 5% significance level.
1. First, we define the null hypothesis and the alternate hypothesis and our alternate hypothesis. Ho: µ = 50
Ha: µ>50
2. State the level of significance. Here, our level of significance given in this question (∝ =0.05), if not given
then we take ∝=0.05.
3. Now, we look up to the z-table. For the value of ∝=0.05, the z-score for the right-tailed test is 1.645.
4. Now, we perform the Z-test on the problem:
Z = (Sample mean - µ)/(σ/√n)
Where:
X = 110
Mean (mu) = 100
Standard deviation (sigma) = 15
Significance level (alpha) = 0.05
n = 50

Here 4.71 >1.645, so we reject the null hypothesis. If z-test statistics is less than z-score, then we will not reject
the null hypothesis.

9. Decision rule:
A decision rule specifies precisely when the null hypothesis (Ho) should be rejected. Ho should be rejected
if the observed z equals or is more positive than Z critical value or is more negative than z critical value.
Example:
Consider the z critical value with significance level of 0.05 is 1.96. Then the null hypothesis is to be
rejected if the observed z equals or is more positive than 1.96, or is more negative than -1.96. Conversely, Null
hypothesis should be retained if the observed z falls between +1.96 to -1.96.

Critical z-scores:
A z-scores that separates common from rare outcomes and hence dictates whether its should be retained or rejected.
Because of their vital role in the decision about H0, these scores are referred to as critical z scores.

Level of Significance:
Fig indicates the proportion (.025 +.025= .05) of the total area that is identified with rare outcomes. Often referred to
as the level of significance of the statistical test, this proportion is symbolized by the Greek letter α (alpha) . The
degree of rarity required of an observed outcome in order to reject the null hypothesis (H0). For instance, the .05
level of significance indicates that H0 should be rejected if the observed z could have occurred just by chance with a
probability of only .05 (one chance out of twenty) or less.

10. Decisions:
Either retain or reject H0, depending on the location of the observed z value relative to the critical z values specified
in the decision rule. According to the present rule, H0 should be rejected at the .05 level of significance because the
observed z of 3 exceeds the critical z of 1.96 and, therefore, qualifies as a rare outcome, that is, an unlikely outcome
from a population centered about the null hypothesis.
Retain or Reject H0?
If ever confused about whether to retain or reject H0, recall the logic behind the hypothesis test. To reject
H0 only if the observed value of z qualifies as a rare outcome because it deviates too far into the tails of the
sampling distribution. Therefore, to reject H0 only if the observed value of z statistics equals or is more positive
than the upper critical z values or if it equals or is more negative than the lower critical z values.
If you are ever confused about whether to retain or reject H0, recall the logic behind the hypothesis test.
You want to reject H0 only if the observed value of z qualifies as a rare outcome because it deviates too far into the
tails of the sampling distribution. Therefore, you want to reject H0 only if the observed value of z equals or is more
positive than the upper critical z (1.96) or if it equals or is more negative than the lower critical z (–1.96).
Before deciding, you might find it helpful to sketch the hypothesized sampling distribution, along with its
critical z values and shaded rejection regions, and then use some mark, such as an arrow ( ), to designate the location
of the observed value of z (3) along the z scale. If this mark is located in the shaded rejection region—or farther out
than this region, as in Figure —then H0 should be rejected.

11. Interpretation:
Finally, interpret the decision in terms of the original research problem. Although not a strict consequence
of the present test, a more specific conclusion is possible.
Example:
Consider the mean SAT math score for the local freshman class probably differs from the national average
of 500 and the mean of sample is 533.
The solution to it can be concluded that, the null hypothesis was rejected. Since the sample mean of 533 (or
its equivalent z of 3) falls in the upper rejection region of the hypothesized sampling distribution, it can be
concluded that the population mean SAT math score for all local freshmen probably exceeds the national average
of 500.
By the same token, if the observed sample mean or its equivalent z had fallen in the lower rejection region
of the hypothesized sampling distribution, it could have been concluded that the population mean for all local
freshmen probably is below the national average.
If the observed sample mean or its equivalent z had fallen in the retention region of the hypothesized
sampling distribution, it would have been concluded that there is no evidence that the population mean for all
local freshmen differs from the national average of 500.

12.One tailed and Two Tailed Tests:

Two-Tailed Test:
Two-tailed hypothesis tests are also known as non-directional and two-sided tests because you can test for
effects in both directions. When you perform a two-tailed test, you split the significance level percentage between
both tails of the distribution. For example, an alpha of 5% is splited and the distribution has two shaded regions of
2.5% (2 * 2.5% = 5%).
When a test statistic falls in either critical region, then the sample data are sufficiently incompatible with the null
hypothesis that allow to can reject it for the population.

In a two-tailed test, the generic null and alternative hypotheses are the following:

o Null: The effect equals zero.

o Alternative: The effect does not equal zero.

Therefore, the alternative hypothesis, H1, is the complement of the null hypothesis, H0.
Generally, the alternative hypothesis, H1, is the complement of the null hypothesis, H0. Under typical conditions,
the form of H1 resembles that shown for the SAT example, namely,

This alternative hypothesis says that the null hypothesis should be rejected if the mean reading score for the
population of local freshmen differs in either direction from the national average of 500. An observed z will qualify
as a rare outcome if it deviates too far either below or above the national average.
The corresponding decision rule, with its pair of critical z scores of ±1.96, is referred to as a two-tailed or
nondirectional test.

Advantages of two-tailed hypothesis tests:

● Used to detect both positive and negative effects.

● Two-tailed tests are standard in scientific research where discovering any type of effect is
usually of interest to researchers.

One-Tailed Test (Lower Tail Critical):

One-tailed hypothesis tests are also known as directional and one-sided tests because you can test for
effects in only one direction. When you perform a one-tailed test, the entire significance level percentage goes into
the extreme end of one tail of the distribution. If the alpha value is 5%, then each distribution has one shaded region
of 5%. When one-tailed test is performed, it is needed to determine whether the critical region is in the left tail or the
right tail. The test can detect an effect only in the direction that has the critical region. It has absolutely no capacity
to detect an effect in the other direction.
In a one-tailed test, they are two options for the null and alternative hypotheses, which corresponds to the
place of the critical region. Generally either the following sets of generic hypotheses are follows:

Type 1:
Null: The effect is less than or equal to zero. Alternative: The effect is greater than zero.
Type 2:
Null: The effect is greater than or equal to zero. Alternative: The effect is less than zero.

Advantages and disadvantages of one-tailed hypothesis tests:

One-tailed tests have more statistical power to detect an effect in one direction than a two-tailed test with the same
design and significance level. One-tailed tests occur most frequently for studies where one of the following is true:

o Effects can exist in only one direction.

o Effects can exist in both directions but the researchers only care about an effect in one direction. There is no
drawback to failing to detect an effect in the other direction. (Not recommended.)

The disadvantage of one-tailed tests is that they have no statistical power to detect an effect in the other direction.

One or Two Tails?

Before a hypothesis test, if there is a concern that the true population mean differs from the hypothesized population
mean only in a particular direction, use the appropriate one-tailed or directional test for extra sensitivity. Otherwise,
use the more customary two-tailed or non-directional test.
For instance, if a one-tailed test with the lower tail critical had been used with the data for 100 freshmen
from the SAT example, H0 would have been retained because, even though the observed z equals an impressive
value of 3, it deviates in the direction of no concern—in this case, above the national average. Clearly, a one-tailed
test should be adopted only when there is absolutely no concern about deviations, even very large deviations, in one
direction. If there is the slightest concern about these deviations, use a two-tailed test.

13.Estimation:
A point estimate for μ uses a single value to represent the unknown population mean. This is the most
straightforward type of estimate. If a random sample of 100 local freshmen reveals a sample mean SAT score of 533,
then 533 will be the point estimate of the unknown population mean for all local freshmen. The best single point
estimate for the unknown population mean is simply the observed value of the sample mean.
Drawbacks: Although straightforward, simple, and precise, point estimates suffer from a basic deficiency. They
tend to be inaccurate. Because of sampling variability, it’s unlikely that a single sample mean, such as 533, will
coincide with the population mean. Since point estimates convey no information about the degree of inaccuracy due
to sampling variability, statisticians supplement point estimates with another, more realistic type of estimate, known
as interval estimates or confidence intervals.

14.Confidence Interval:
A confidence interval for μ uses a range of values that, with a known degree of certainty, includes the unknown
population mean.
For instance, the SAT investigator might use a confidence interval to claim, with 95 percent confidence,
that the interval between 511.44 and 554.56 includes the population mean math score for all local freshmen. To be
95 percent confident signifies that if many of these intervals were constructed for a long series of samples,
approximately 95 percent would include the population mean for all local freshmen. In the long run, 95 percent of
these confidence intervals are true because they include the unknown population mean. The remaining 5 percent are
false because they fail to include the unknown population mean.
How Works:
▪ The mean of the sampling distribution equals the unknown population mean for all local freshmen,
whatever its value, because the mean of this sampling distribution always equals the population mean.
▪ The standard error of the sampling distribution equals the value (11) obtained from dividing the population
standard deviation (110) by the square root of the sample size ( 100 ).
▪ The shape of the sampling distribution approximates a normal distribution because the sample size of 100
satisfies the requirements of the central limit theorem.

A Series of Confidence Intervals

Only one sample mean is actually taken from this sampling distribution and used to construct a single 95
percent confidence interval. However, imagine taking not just one but a series of randomly selected sample
means from this sampling distribution. For each sample mean, construct a 95 percent confidence interval by
adding 1.96 standard errors to the sample mean and subtracting 1.96 standard errors from the sample mean; that
is, use the expression 1.96 standard errors to obtain a 95 percent confidence interval for each sample mean.

True Confidence Intervals:

The sampling distribution is normal, 95 percent of all sample means are within 1.96 standard errors of
the unknown population mean, that is, 95 percent of all sample means deviate less than 1.96 standard errors
from the unknown population mean. Therefore, and this is the key point, when sample means are expanded
into confidence intervals—by adding and subtracting 1.96 standard errors—95 percent of all possible
confidence intervals are true because they include the unknown population mean. To illustrate
The corresponding 15 confidence intervals have ranges that span the broken line for the population
mean, thereby qualifying as true intervals because they include the value of the unknown population mean.
False Confidence Intervals
When sample means are expanded into confidence intervals—by adding and subtracting 1.96
standard errors—5 percent of all possible confidence intervals are false because they fail to include the
unknown population mean. To illustrate this point, only 1 of the 16 sample means shown in Figure 12.2 is
not within 1.96 standard errors of the unknown population mean. The resulting confidence interval, shown
as shaded, has a range that does not span the broken line for the population mean, thereby being designated
as a false interval because it fails to include the value of the unknown population mean

15. Level Of Confidence :

The level of confidence indicates the percent of time that a series of confidence intervals includes the
unknown population characteristic, such as the population mean. Any level of confidence may be assigned to a
confidence interval merely by substituting an appropriate value for zcon.
Effect on Width of Interval
Notice that the 99 percent confidence interval of 504.62 to 561.38 is wider and, therefore, less precise than
the corresponding 95 percent confidence interval of 511.44 to 554.56. The shift from a 95 percent to a 99 percent
level of confidence requires an increase in the value of zconf from 1.96 to 2.58. This increase, in turn, causes a
wider, less precise confidence interval. Any shift to a higher level of confidence always produces a wider, less
precise confidence interval unless offset by an increase in sample size, as mentioned in the next section.
Choosing a Level of Confidence
Although many different levels of confidence have been used, 95 percent and 99 percent are the most
prevalent. Generally, a larger level of confidence, such as 99 percent, should be reserved for situations in which a
false interval might have particularly serious consequences, such as the failure of a national opinion pollster to
predict the winner of a presidential election
16.Effect Of Sample Size:
The larger the sample size, the smaller the standard error and, hence, the more precise (narrower) the
confidence interval will be. Indeed, as the sample size grows larger, the standard error will approach zero and the
confidence interval will shrink to a point estimate. Given this perspective, the sample size for a confidence interval,
unlike that for a hypothesis test, never can be too large.
Selection of Sample Size
As with hypothesis tests, sample size can be selected according to specifications established before the
investigation.

Big Data Chapter 2
No ratings yet
Big Data Chapter 2
62 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
62 pages
Unit 1 Topic 1 Intro
100% (1)
Unit 1 Topic 1 Intro
30 pages
Business Analytics Theory Exam Notes
No ratings yet
Business Analytics Theory Exam Notes
61 pages
Ca 1 Merged
No ratings yet
Ca 1 Merged
677 pages
Data Science Introduction
100% (1)
Data Science Introduction
54 pages
Da Unit 1
No ratings yet
Da Unit 1
12 pages
Data Analytics - Notes
No ratings yet
Data Analytics - Notes
1 page
Predictive Analytics and Descriptive Analytics
No ratings yet
Predictive Analytics and Descriptive Analytics
12 pages
U1 C CLSRM
No ratings yet
U1 C CLSRM
30 pages
Statistical Modeling For Data Analysis
100% (1)
Statistical Modeling For Data Analysis
24 pages
Intro To Business Analytics
No ratings yet
Intro To Business Analytics
27 pages
Business Intelligence and Analytics Notes
No ratings yet
Business Intelligence and Analytics Notes
260 pages
Data Analytics Categories in Accounting
No ratings yet
Data Analytics Categories in Accounting
10 pages
Data Analytics Msbte K Scheme: by Study Tech
No ratings yet
Data Analytics Msbte K Scheme: by Study Tech
19 pages
Cisco Data Analytics Overview
No ratings yet
Cisco Data Analytics Overview
5 pages
Lesson 2 Business Analytics Framework
No ratings yet
Lesson 2 Business Analytics Framework
29 pages
Slide For Chapter 3
No ratings yet
Slide For Chapter 3
26 pages
Lecture 3. Modeling and Evaluation
No ratings yet
Lecture 3. Modeling and Evaluation
51 pages
Data Analytics and Business Intelligence NOTES
No ratings yet
Data Analytics and Business Intelligence NOTES
37 pages
DA - Unit I
No ratings yet
DA - Unit I
83 pages
Data Analysis Models: Service & Manufacturing Examples
No ratings yet
Data Analysis Models: Service & Manufacturing Examples
9 pages
Lecture (1) Chapter
No ratings yet
Lecture (1) Chapter
62 pages
Cami16 Data Analytics
No ratings yet
Cami16 Data Analytics
37 pages
Bridging Blaze Lbolytc Finals Reviewer
No ratings yet
Bridging Blaze Lbolytc Finals Reviewer
33 pages
Q) Concept of Data Analytics
No ratings yet
Q) Concept of Data Analytics
28 pages
Chapter 3 Lecture ACC32021
No ratings yet
Chapter 3 Lecture ACC32021
57 pages
Data Analytics Introduction Guide
No ratings yet
Data Analytics Introduction Guide
13 pages
Business Analytics & Decision Making
No ratings yet
Business Analytics & Decision Making
37 pages
L1 Intro Data Analytics
No ratings yet
L1 Intro Data Analytics
2 pages
Quantitative Methods For Management: Term II 4 Credits MGT 408
No ratings yet
Quantitative Methods For Management: Term II 4 Credits MGT 408
49 pages
Data Analytics With Python Lecture 1
No ratings yet
Data Analytics With Python Lecture 1
23 pages
LBYACST (Lecture Notes)
No ratings yet
LBYACST (Lecture Notes)
9 pages
Module1 Complte Notes
No ratings yet
Module1 Complte Notes
129 pages
Slide For Chapter 3
No ratings yet
Slide For Chapter 3
26 pages
Introduction To Data Analytics
No ratings yet
Introduction To Data Analytics
96 pages
MGT 1103
No ratings yet
MGT 1103
4 pages
Data Analytics Syllabus Overview
No ratings yet
Data Analytics Syllabus Overview
80 pages
Customizable Real-Time Business Analytics
No ratings yet
Customizable Real-Time Business Analytics
24 pages
Unit 2 DS
No ratings yet
Unit 2 DS
30 pages
Data ANALYSIS and Data Interpretation
No ratings yet
Data ANALYSIS and Data Interpretation
15 pages
Descriptive Analytics Guide
No ratings yet
Descriptive Analytics Guide
126 pages
Data Analytics Rev
No ratings yet
Data Analytics Rev
5 pages
Introduction To Data Science and Data Analytics
No ratings yet
Introduction To Data Science and Data Analytics
85 pages
Dataanalyticsunit-1 (2) 104014
No ratings yet
Dataanalyticsunit-1 (2) 104014
51 pages
Discussion Board 2
No ratings yet
Discussion Board 2
5 pages
Module 1 - BA
No ratings yet
Module 1 - BA
24 pages
BADM lý thuyết 1-6
No ratings yet
BADM lý thuyết 1-6
11 pages
Statistical Data Analysis Techniques
100% (1)
Statistical Data Analysis Techniques
3 pages
Statistics
No ratings yet
Statistics
14 pages
Untitled Document-1
No ratings yet
Untitled Document-1
3 pages
1overview On Data Analysis
No ratings yet
1overview On Data Analysis
67 pages
Lecture 9
No ratings yet
Lecture 9
46 pages
Data Analytics
100% (8)
Data Analytics
346 pages
Module I - 1
No ratings yet
Module I - 1
23 pages
Report Template - PHYS 194 - 14975 - L 09
No ratings yet
Report Template - PHYS 194 - 14975 - L 09
6 pages
Sociological Model Assignment
No ratings yet
Sociological Model Assignment
5 pages
Randall Collins
No ratings yet
Randall Collins
20 pages
The Role of A Mathematics Student (SEMINAR100) - PPT
No ratings yet
The Role of A Mathematics Student (SEMINAR100) - PPT
24 pages
Nazi Science & UFOs Unveiled
No ratings yet
Nazi Science & UFOs Unveiled
20 pages
Qualitative Research: Interview Essentials
No ratings yet
Qualitative Research: Interview Essentials
62 pages
Free Research Project Topics and Materials For Final Year Students On Uniprojectmaterials
No ratings yet
Free Research Project Topics and Materials For Final Year Students On Uniprojectmaterials
4 pages
Computer Technology Syllabus
No ratings yet
Computer Technology Syllabus
15 pages
Curriculum Guide
No ratings yet
Curriculum Guide
2 pages
General Inorganic Chemistry Syllabus
No ratings yet
General Inorganic Chemistry Syllabus
13 pages
STS Instructional Modules - Midterm Coverage
No ratings yet
STS Instructional Modules - Midterm Coverage
58 pages
Determinants of Curriculum Development
No ratings yet
Determinants of Curriculum Development
31 pages
Chemistry
No ratings yet
Chemistry
84 pages
Criminology Research Review Guide
No ratings yet
Criminology Research Review Guide
2 pages
Malhar '09 - Rules and Regulations For Events
No ratings yet
Malhar '09 - Rules and Regulations For Events
90 pages
8 Magic and Science in Yorubaland
100% (8)
8 Magic and Science in Yorubaland
17 pages
Criteria for Good Research Explained
No ratings yet
Criteria for Good Research Explained
13 pages
Thesis Topics For Library and Information Science
100% (2)
Thesis Topics For Library and Information Science
5 pages
Aucr 2017
No ratings yet
Aucr 2017
45 pages
Community Engagement and Dynamics Quiz
No ratings yet
Community Engagement and Dynamics Quiz
1 page
Ultimate Bundle Financial Analysis With Microsoft Excel 9th Edition Ebook and TestBank Bundle
No ratings yet
Ultimate Bundle Financial Analysis With Microsoft Excel 9th Edition Ebook and TestBank Bundle
329 pages
American Atheist Magazine June 1980
No ratings yet
American Atheist Magazine June 1980
44 pages
Scientific Miracles in Islam
No ratings yet
Scientific Miracles in Islam
26 pages
Lensink Elena Sociotechnical Synthesis
No ratings yet
Lensink Elena Sociotechnical Synthesis
4 pages
Kuliah-19 Dualisme Partikel Gelombang
No ratings yet
Kuliah-19 Dualisme Partikel Gelombang
24 pages
PhD Interview Synopsis: Thesis Review
No ratings yet
PhD Interview Synopsis: Thesis Review
2 pages
Free Sample Dissertation Literature Review
100% (3)
Free Sample Dissertation Literature Review
6 pages
Test Bank For Laboratory Manual For Human Anatomy and Physiology Cat 4th Edition
No ratings yet
Test Bank For Laboratory Manual For Human Anatomy and Physiology Cat 4th Edition
6 pages
B.Varenius Vs I.Knat
No ratings yet
B.Varenius Vs I.Knat
10 pages
Isaac Asimov: Sci-Fi Pioneer Overview
No ratings yet
Isaac Asimov: Sci-Fi Pioneer Overview
13 pages

Inferential Statistics and Data Analytics Guide

Uploaded by

Inferential Statistics and Data Analytics Guide

Uploaded by

UNIT – 3

Why analytics is important?

1. Determining credit risk

Difference Between Data analytics and data analysis:

S.No Data Analytics Data Analysis

Probability and Statistics Sampling distribution:

Sampling distribution of the mean:

5.Standard error of the mean ( σ ¯x ):

Shape of the sampling distribution:

Sample Mean R.F or probability

Special Type of Standard Deviation

12.One tailed and Two Tailed Tests:

o Null: The effect equals zero.

Advantages of two-tailed hypothesis tests:

● Used to detect both positive and negative effects.

One-Tailed Test (Lower Tail Critical):

Advantages and disadvantages of one-tailed hypothesis tests:

o Effects can exist in only one direction.

One or Two Tails?

A Series of Confidence Intervals

True Confidence Intervals:

15. Level Of Confidence :

You might also like