Department of Artificial Intelligence & Data Science
New Horizon Institute of Technology and Management
Module – 2
Data and Sampling Distributions
(05 Hrs)
CSDLO5011.2:
Student will be able to: Describe Data and Sampling Distributions.
Unit
Week Topics to be covered
No.
6. Random Sampling and Sample Bias, Random Selection, Size Vs 2.1
Quality, Sample Mean Versus Population Mean, Selection Bias
7. Regression to the Mean, Sampling Distribution of a Statistic,
Central Limit Theorem, Standard Error, The Bootstrap,
Resampling Versus Bootstrapping.
2 8. Confidence Intervals, Normal Distribution, Standard Normal and 2.2
QQ-Plots
9. Long-Tailed Distributions, Student‘s t-Distribution, Binomial
Distribution, Chi-Square Distribution
10. F-Distribution, Poisson and Related Distributions, Poisson
Distributions
11. Exponential Distribution, Estimating the Failure Rate, Weibull
3
Distribution.
1
Data and Sampling Distribution
Random Sampling and Sample Bias
• Sample: A subset from a larger data set.
• Population: The larger data set or idea of a data set.
• N (n): The size of the population (sample).
• Random sampling: Drawing elements into a sample at random.
• Stratified sampling: Dividing the population into strata and randomly
sampling from each strata.
• Stratum (pl., strata): A homogeneous subgroup of a population with
common characteristics.
• Simple random sample: The sample that results from random sampling
without stratifying the
• population.
• Bias: Systematic error.
• Sample bias: A sample that misrepresents the population.
2
Sampling
Sampling:
o Data reduction technique.
o Allows larger data set to be represented by
smaller random data subsets.
Sampling
Sampling Techniques:
3
Sampling
Simple Random Sampling:
o Without Replacement: Probability of drawing sample from the entire population
are equally likely to be sampled.
o With Replacement: Each time sample is drawn, it is recorded and then replaced.
Sampling
Systematic Sampling:
o Selects samples in a systematic, orderly and logical way from the
population.
o Provides a representative sample.
4
Sampling
Stratified Sampling:
Sampling
Stratified Sampling:
o If dataset is divided into mutually disjoint parts called strata,
a stratified sample of dataset is generated by obtaining an
simple random sampling at each stratum.
o This helps ensure a representative sample, especially when
the data are skewed.
5
Sampling
Cluster Sampling:
o Population is divided into subgroups, knows as clusters.
o A whole cluster is randomly selected.
Sampling
Non- Probability Sampling:
Convenience Sampling:
o Samples are selected based on their availability and willingness to take
part.
o Prone to significant bias
6
Sampling
Quota Sampling:
o Sample selection is based on predetermined characteristics of the
population.
Sampling
Judgement Sampling:
o Selective sampling.
o Depends on the judgement of the experts when choosing to ask to
participate.
7
Sampling
Snowball Sampling:
o Existing samples nominate further so that sample size increases like a
rolling snowball.
Text Books:
1. Bruce, Peter, and Andrew Bruce. Practical statistics for data scientists: 50
essential concepts. Reilly Media, 2017.
2. Mathematical Statistics and Data Analysis John A. Rice University of
California, Berkeley, Thomson Higher Education
Reference Books:
1. Dodge, Yadolah, ed. Statistical data analysis and inference. Elsevier, 2014.
2. Ismay, Chester, and Albert Y. Kim. Statistical Inference via Data Science: A
Modern Dive into R and the Tidyverse. CRC Press, 2019.
3. Milton. J. S. and Arnold. J.C., "Introduction to Probability and Statistics",
Tata McGraw Hill, 4th Edition, 2007.
4. Johnson. R.A. and Gupta. C.B., "Miller and Freund‘s Probability and
Statistics for Engineers", Pearson Education, Asia, 7th Edition, 2007.
5. A. Chandrasekaran, G. Kavitha, ―Probability, Statistics, Random Processes
and Queuing Theory‖, Dhanam Publications, 2014.
8
Useful Links:
1. [Link]
2. [Link]
3. [Link]
and-data-analysis
THANK YOU