Introduction to Business Statistics (Chapter 1) Descriptive Statistics: Summarizes and describes the
main features of a data set (e.g., mean, median).
1.1 Data
Statistical Inference: Uses sample data to make
Data: Facts and figures used to draw conclusions.
conclusions about the population.
Data Set: Collection of data for a specific study.
1.4 Case Studies on Sampling and Statistical Inference
Elements: The entities (people, objects, events)
Cell Phone Case: Estimating cell phone costs using
being studied.
sample data.
Variable: A characteristic of an element that can be
Marketing Research Case: Rating a new bottle design
measured.
based on consumer feedback.
o Quantitative Variable: Numerical values
Car Mileage Case: Estimating average car mileage
representing quantities (e.g., age, income).
using sample data.
o Qualitative Variable: Categorical values (e.g.,
Importance of Random Sampling:
gender, color).
Ensures that the sample is representative of the
Types of Data:
population, reducing bias.
Cross-Sectional Data: Collected at the same point in
1.5 Scales of Measurement (Optional)
time.
Nominal Scale: Categories with no order (e.g.,
Time Series Data: Collected over different time
gender, colors).
periods (e.g., monthly sales data).
Ordinal Scale: Categories with a specific order (e.g.,
1.2 Data Sources
rankings, satisfaction levels).
Existing Sources: Data already collected by others
Interval Scale: Numerical values with equal intervals
(e.g., government reports, libraries, internet).
but no true zero (e.g., temperature in Celsius).
Experimental Studies: Data collected by
Ratio Scale: Numerical values with equal intervals
manipulating independent variables to observe
and a true zero (e.g., weight, income).
effects on a response variable.
Key Takeaways:
Observational Studies: Data collected without
manipulating variables (e.g., surveys). Data is the foundation of statistical analysis, and
understanding variables and data types is crucial.
Steps in Initiating a Study:
Data Sources can be existing or collected through
1. Define the response variable (variable of interest).
experimental/observational studies.
2. Identify independent variables (related factors).
Populations and Samples help generalize findings
3. Decide if the study is experimental (manipulate from a subset to the entire group.
variables) or observational (no manipulation).
Random Sampling ensures unbiased and
1.3 Populations and Samples representative data collection.
Population: The entire set of elements of interest. Scales of Measurement help classify data for
appropriate analysis.
Census: Data collected from every element in the
population. Descriptive Statistics - Tabular and Graphical Methods
Sample: A subset of the population used to draw 1. Summarizing Qualitative Data
conclusions about the entire population.
Frequency Distribution: A table that summarizes the
Descriptive Statistics vs. Statistical Inference: number (frequency) of items in each category.
o Relative Frequency: Proportion of items in Interpretation: Clusters, gaps, and outliers can be
each class (frequency ÷ total observations). easily identified.
o Percent Frequency: Relative frequency 4. Stem-and-Leaf Displays
multiplied by 100.
Definition: A graphical method that splits each data
Graphical Methods: point into a "stem" (leading digit(s)) and a "leaf"
(trailing digit).
o Bar Charts: Represent frequencies of
categories using bars. Use: Display the distribution of data while retaining
the original values.
o Pie Charts: Show proportions of categories
as slices of a pie. Example: For the number 23, the stem is 2 and the
leaf is 3.
o Pareto Chart: A bar chart where categories
are ordered by frequency, highlighting the 5. Contingency Tables (Optional)
most significant categories.
Definition: A table that classifies data based on two
2. Summarizing Quantitative Data dimensions (rows and columns).
Frequency Distribution: Group quantitative data into Use: Examine relationships between two categorical
classes (intervals) and count the number of variables.
observations in each class.
Example: Rows could represent gender, and columns
o Steps: could represent product preferences.
1. Determine the number of classes. 6. Scatter Plots (Optional)
2. Calculate class length (range ÷ Definition: A graph that shows the relationship
number of classes). between two quantitative variables.
3. Form non-overlapping classes of o X-axis: Independent variable.
equal width.
o Y-axis: Dependent variable.
4. Tally and count observations in each
Types of Relationships:
class.
o Linear: Data points form a straight line.
5. Graph the histogram.
Positive: As one variable increases,
Graphical Methods:
the other increases.
o Histogram: A bar chart for quantitative data,
Negative: As one variable increases,
showing the distribution of data across
the other decreases.
classes.
o No Linear Relationship: No clear pattern
o Frequency Polygon: A line graph connecting
between variables.
the midpoints of the tops of the bars in a
histogram. 7. Misleading Graphs and Charts (Optional)
o Ogive: A line graph that shows cumulative Common Issues:
frequencies.
o Scaling: Manipulating the axis scale to
3. Dot Plots exaggerate or minimize trends.
Definition: A simple graphical display where each o Truncated Axes: Starting the axis at a value
data point is represented by a dot along a number other than zero to distort proportions.
line.
o 3D Effects: Using 3D visuals that can distort
Use: Visualize the distribution of small data sets. the perception of data.
How to Spot: Always check the axes, scales, and o Standard Deviation: The square root of the
context of the graph. variance. Measures the spread of data
around the mean.
Key Concepts
Empirical Rule:
Frequency Distribution: Summarizes data by
counting occurrences in categories or classes. For normal distributions:
Bar Charts & Pie Charts: Used for qualitative data to o ~68% of data falls within ±1 standard
show frequencies or proportions. deviation of the mean.
Histograms & Frequency Polygons: Used for o ~95% within ±2 standard deviations.
quantitative data to show distributions.
o ~99.7% within ±3 standard deviations.
Dot Plots & Stem-and-Leaf Displays: Simple
Chebyshev’s Theorem:
graphical methods for small data sets.
Applies to any distribution:
Contingency Tables: Analyze relationships between
two categorical variables. o At least 75% of data falls within ±2 standard
Scatter Plots: Visualize relationships between two deviations.
quantitative variables. o At least 89% within ±3 standard deviations.
Misleading Graphs: Be cautious of graphs that z-Scores:
distort data through scaling or visual effects.
Measures how many standard deviations a value (x)
Descriptive Statistics - Numerical Methods (Chapter 3) is from the mean.
3.1 Describing Central Tendency o Positive z-score: x is above the mean.
Central Tendency: Represents the center or middle o Negative z-score: x is below the mean.
of a data set.
o z = 0: x is equal to the mean.
Measures of Central Tendency:
o Mean (μ): The average value. Calculated as
the sum of all values divided by the number 3.3 Percentiles, Quartiles, and Box-and-Whiskers
of values. Displays
o Median (Md): The middle value when data is Percentile: A value below which a given percentage
ordered. If there’s an even number of of data falls.
observations, it’s the average of the two o 1st Quartile (Q1): 25th percentile.
middle values.
o 2nd Quartile (Median): 50th percentile.
o Mode (Mo): The most frequently occurring
value in the data set. o 3rd Quartile (Q3): 75th percentile.
3.2 Measures of Variation Interquartile Range (IQR): Q3 - Q1. Measures the
spread of the middle 50% of data.
Variation: Describes how spread out the data is.
Box-and-Whisker Plot: Visualizes the distribution of
Measures of Variation: data using quartiles, median, and outliers.
o Range: The difference between the largest 3.4 Covariance, Correlation, and Least Squares Line
and smallest values. (Optional)
o Variance: The average of the squared Covariance: Measures the relationship between two
deviations from the mean. variables (x and y).
o Positive Covariance: As x increases, y Covariance and Correlation: Measure relationships
increases. between variables.
o Negative Covariance: As x increases, y Weighted Mean and Geometric Mean: Useful for
decreases. specialized data analysis.
Correlation Coefficient (r): Measures the strength
and direction of the linear relationship between two
Probability
variables.
o Ranges from -1 to 1.
o r = 1: Perfect positive correlation.
o r = -1: Perfect negative correlation.
o r = 0: No correlation.
Least Squares Line: A line that minimizes the sum of
squared differences between observed and predicted
values (used in regression analysis).
3.5 Weighted Means and Grouped Data (Optional)
Weighted Mean: Used when some data points are
more important than others. Calculated by
multiplying each value by its weight and dividing by
the sum of weights.
Grouped Data: Data organized into intervals. Mean
and standard deviation can be estimated using
midpoint values and frequencies.
3.6 Geometric Mean (Optional)
Geometric Mean: Used for rates of return or growth
rates.
o Calculated as the nth root of the product of
(1 + R₁) × (1 + R₂) × ... × (1 + Rₙ), where Rᵢ are
the rates of return.
o Useful for calculating average growth over
multiple periods.
Key Takeaways:
Central Tendency: Mean, median, and mode
describe the center of data.
Variation: Range, variance, and standard deviation
measure data spread.
Percentiles and Quartiles: Help understand data
distribution and identify outliers.
Discrete Random Variable
Continuous Random Variable
Sampling and Sampling Distribution 5. Stratified Random, Cluster, and Systematic Sampling
(Optional)
Stratified Random Sampling:
o Divide the population into non-overlapping
groups (strata) based on similarity.
o Randomly sample from each stratum.
o Combine the samples to form the full
sample.
o Use: When the population has distinct
subgroups (e.g., age, gender, income).
Cluster Sampling:
o Divide the population into clusters (e.g.,
schools, neighborhoods).
o Randomly select entire clusters for
sampling.
o Use: When it is difficult to sample
individuals directly.
Systematic Sampling:
o Select every kk-th element from a list after
a random start.
o Use: When the population is ordered in
some way.
6. Surveys and Errors in Survey Sampling (Optional)
Types of Survey Questions:
o Dichotomous: Yes/No questions.
o Multiple Choice: List of options to choose
from.
o Open-Ended: Respondents answer in their
own words.
Sources of Error:
o Sampling Error: Differences between the
sample and the population.
o Non-Sampling Error: Errors due to data
collection, processing, or respondent bias.
Key Concepts
Random Sampling: Ensures every subset of the
population has an equal chance of being selected.
Sampling Distribution: The distribution of a statistic
(e.g., mean, proportion) over all possible samples.
Central Limit Theorem: The sampling distribution of
the mean is approximately normal for large nn.
Stratified, Cluster, and Systematic Sampling:
Alternative sampling methods for specific scenarios.
Survey Errors: Sampling and non-sampling errors
can affect the accuracy of survey results.
Confidence Intervals