0% found this document useful (0 votes)
21 views9 pages

EDA - Reviewer Midterm

Uploaded by

s.bakansa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views9 pages

EDA - Reviewer Midterm

Uploaded by

s.bakansa
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

Introduction to Business Statistics (Chapter 1)  Descriptive Statistics: Summarizes and describes the

main features of a data set (e.g., mean, median).


1.1 Data
 Statistical Inference: Uses sample data to make
 Data: Facts and figures used to draw conclusions.
conclusions about the population.
 Data Set: Collection of data for a specific study.
1.4 Case Studies on Sampling and Statistical Inference
 Elements: The entities (people, objects, events)
 Cell Phone Case: Estimating cell phone costs using
being studied.
sample data.
 Variable: A characteristic of an element that can be
 Marketing Research Case: Rating a new bottle design
measured.
based on consumer feedback.
o Quantitative Variable: Numerical values
 Car Mileage Case: Estimating average car mileage
representing quantities (e.g., age, income).
using sample data.
o Qualitative Variable: Categorical values (e.g.,
Importance of Random Sampling:
gender, color).
 Ensures that the sample is representative of the
Types of Data:
population, reducing bias.
 Cross-Sectional Data: Collected at the same point in
1.5 Scales of Measurement (Optional)
time.
 Nominal Scale: Categories with no order (e.g.,
 Time Series Data: Collected over different time
gender, colors).
periods (e.g., monthly sales data).
 Ordinal Scale: Categories with a specific order (e.g.,
1.2 Data Sources
rankings, satisfaction levels).
 Existing Sources: Data already collected by others
 Interval Scale: Numerical values with equal intervals
(e.g., government reports, libraries, internet).
but no true zero (e.g., temperature in Celsius).
 Experimental Studies: Data collected by
 Ratio Scale: Numerical values with equal intervals
manipulating independent variables to observe
and a true zero (e.g., weight, income).
effects on a response variable.
Key Takeaways:
 Observational Studies: Data collected without
manipulating variables (e.g., surveys).  Data is the foundation of statistical analysis, and
understanding variables and data types is crucial.
Steps in Initiating a Study:
 Data Sources can be existing or collected through
1. Define the response variable (variable of interest).
experimental/observational studies.
2. Identify independent variables (related factors).
 Populations and Samples help generalize findings
3. Decide if the study is experimental (manipulate from a subset to the entire group.
variables) or observational (no manipulation).
 Random Sampling ensures unbiased and
1.3 Populations and Samples representative data collection.

 Population: The entire set of elements of interest.  Scales of Measurement help classify data for
appropriate analysis.
 Census: Data collected from every element in the
population. Descriptive Statistics - Tabular and Graphical Methods

 Sample: A subset of the population used to draw 1. Summarizing Qualitative Data


conclusions about the entire population.
 Frequency Distribution: A table that summarizes the
Descriptive Statistics vs. Statistical Inference: number (frequency) of items in each category.
o Relative Frequency: Proportion of items in  Interpretation: Clusters, gaps, and outliers can be
each class (frequency ÷ total observations). easily identified.

o Percent Frequency: Relative frequency 4. Stem-and-Leaf Displays


multiplied by 100.
 Definition: A graphical method that splits each data
 Graphical Methods: point into a "stem" (leading digit(s)) and a "leaf"
(trailing digit).
o Bar Charts: Represent frequencies of
categories using bars.  Use: Display the distribution of data while retaining
the original values.
o Pie Charts: Show proportions of categories
as slices of a pie.  Example: For the number 23, the stem is 2 and the
leaf is 3.
o Pareto Chart: A bar chart where categories
are ordered by frequency, highlighting the 5. Contingency Tables (Optional)
most significant categories.
 Definition: A table that classifies data based on two
2. Summarizing Quantitative Data dimensions (rows and columns).

 Frequency Distribution: Group quantitative data into  Use: Examine relationships between two categorical
classes (intervals) and count the number of variables.
observations in each class.
 Example: Rows could represent gender, and columns
o Steps: could represent product preferences.

1. Determine the number of classes. 6. Scatter Plots (Optional)

2. Calculate class length (range ÷  Definition: A graph that shows the relationship
number of classes). between two quantitative variables.

3. Form non-overlapping classes of o X-axis: Independent variable.


equal width.
o Y-axis: Dependent variable.
4. Tally and count observations in each
 Types of Relationships:
class.
o Linear: Data points form a straight line.
5. Graph the histogram.
 Positive: As one variable increases,
 Graphical Methods:
the other increases.
o Histogram: A bar chart for quantitative data,
 Negative: As one variable increases,
showing the distribution of data across
the other decreases.
classes.
o No Linear Relationship: No clear pattern
o Frequency Polygon: A line graph connecting
between variables.
the midpoints of the tops of the bars in a
histogram. 7. Misleading Graphs and Charts (Optional)

o Ogive: A line graph that shows cumulative  Common Issues:


frequencies.
o Scaling: Manipulating the axis scale to
3. Dot Plots exaggerate or minimize trends.

 Definition: A simple graphical display where each o Truncated Axes: Starting the axis at a value
data point is represented by a dot along a number other than zero to distort proportions.
line.
o 3D Effects: Using 3D visuals that can distort
 Use: Visualize the distribution of small data sets. the perception of data.
 How to Spot: Always check the axes, scales, and o Standard Deviation: The square root of the
context of the graph. variance. Measures the spread of data
around the mean.
Key Concepts
Empirical Rule:
 Frequency Distribution: Summarizes data by
counting occurrences in categories or classes.  For normal distributions:

 Bar Charts & Pie Charts: Used for qualitative data to o ~68% of data falls within ±1 standard
show frequencies or proportions. deviation of the mean.

 Histograms & Frequency Polygons: Used for o ~95% within ±2 standard deviations.
quantitative data to show distributions.
o ~99.7% within ±3 standard deviations.
 Dot Plots & Stem-and-Leaf Displays: Simple
Chebyshev’s Theorem:
graphical methods for small data sets.
 Applies to any distribution:
 Contingency Tables: Analyze relationships between
two categorical variables. o At least 75% of data falls within ±2 standard
 Scatter Plots: Visualize relationships between two deviations.
quantitative variables. o At least 89% within ±3 standard deviations.
 Misleading Graphs: Be cautious of graphs that z-Scores:
distort data through scaling or visual effects.
 Measures how many standard deviations a value (x)
Descriptive Statistics - Numerical Methods (Chapter 3) is from the mean.
3.1 Describing Central Tendency o Positive z-score: x is above the mean.
 Central Tendency: Represents the center or middle o Negative z-score: x is below the mean.
of a data set.
o z = 0: x is equal to the mean.
 Measures of Central Tendency:

o Mean (μ): The average value. Calculated as


the sum of all values divided by the number 3.3 Percentiles, Quartiles, and Box-and-Whiskers
of values. Displays

o Median (Md): The middle value when data is  Percentile: A value below which a given percentage
ordered. If there’s an even number of of data falls.
observations, it’s the average of the two o 1st Quartile (Q1): 25th percentile.
middle values.
o 2nd Quartile (Median): 50th percentile.
o Mode (Mo): The most frequently occurring
value in the data set. o 3rd Quartile (Q3): 75th percentile.

3.2 Measures of Variation  Interquartile Range (IQR): Q3 - Q1. Measures the


spread of the middle 50% of data.
 Variation: Describes how spread out the data is.
 Box-and-Whisker Plot: Visualizes the distribution of
 Measures of Variation: data using quartiles, median, and outliers.
o Range: The difference between the largest 3.4 Covariance, Correlation, and Least Squares Line
and smallest values. (Optional)
o Variance: The average of the squared  Covariance: Measures the relationship between two
deviations from the mean. variables (x and y).
o Positive Covariance: As x increases, y  Covariance and Correlation: Measure relationships
increases. between variables.

o Negative Covariance: As x increases, y  Weighted Mean and Geometric Mean: Useful for
decreases. specialized data analysis.

 Correlation Coefficient (r): Measures the strength


and direction of the linear relationship between two
Probability
variables.

o Ranges from -1 to 1.

o r = 1: Perfect positive correlation.

o r = -1: Perfect negative correlation.

o r = 0: No correlation.

 Least Squares Line: A line that minimizes the sum of


squared differences between observed and predicted
values (used in regression analysis).

3.5 Weighted Means and Grouped Data (Optional)

 Weighted Mean: Used when some data points are


more important than others. Calculated by
multiplying each value by its weight and dividing by
the sum of weights.

 Grouped Data: Data organized into intervals. Mean


and standard deviation can be estimated using
midpoint values and frequencies.

3.6 Geometric Mean (Optional)

 Geometric Mean: Used for rates of return or growth


rates.

o Calculated as the nth root of the product of


(1 + R₁) × (1 + R₂) × ... × (1 + Rₙ), where Rᵢ are
the rates of return.

o Useful for calculating average growth over


multiple periods.

Key Takeaways:

 Central Tendency: Mean, median, and mode


describe the center of data.

 Variation: Range, variance, and standard deviation


measure data spread.

 Percentiles and Quartiles: Help understand data


distribution and identify outliers.
Discrete Random Variable
Continuous Random Variable
Sampling and Sampling Distribution 5. Stratified Random, Cluster, and Systematic Sampling
(Optional)

 Stratified Random Sampling:

o Divide the population into non-overlapping


groups (strata) based on similarity.

o Randomly sample from each stratum.

o Combine the samples to form the full


sample.

o Use: When the population has distinct


subgroups (e.g., age, gender, income).

 Cluster Sampling:

o Divide the population into clusters (e.g.,


schools, neighborhoods).

o Randomly select entire clusters for


sampling.

o Use: When it is difficult to sample


individuals directly.

 Systematic Sampling:

o Select every kk-th element from a list after


a random start.

o Use: When the population is ordered in


some way.

6. Surveys and Errors in Survey Sampling (Optional)

 Types of Survey Questions:

o Dichotomous: Yes/No questions.

o Multiple Choice: List of options to choose


from.

o Open-Ended: Respondents answer in their


own words.

 Sources of Error:

o Sampling Error: Differences between the


sample and the population.

o Non-Sampling Error: Errors due to data


collection, processing, or respondent bias.

Key Concepts

 Random Sampling: Ensures every subset of the


population has an equal chance of being selected.
 Sampling Distribution: The distribution of a statistic
(e.g., mean, proportion) over all possible samples.

 Central Limit Theorem: The sampling distribution of


the mean is approximately normal for large nn.

 Stratified, Cluster, and Systematic Sampling:


Alternative sampling methods for specific scenarios.

 Survey Errors: Sampling and non-sampling errors


can affect the accuracy of survey results.

Confidence Intervals

You might also like