Research Methodology
Module-5
-Dr. Jay Prakash Verma
Ph.D., MBA, M.Com., B.Com(H), UGC-NET
Associate Dean & Associate Professor- Author
AGENDA:
• Measurement of central tendency:
Mean, Median, Mode
• Dispersion: Range, Variance,
Descriptive Standard Deviation, Skewness and
Kurtosis
Statistics • Measures of relationship: Correlation
• Sampling and non-sampling errors
• Degree of freedom and standard
error
• Univariate and bivariate analysis.
Descriptive Statistics
in Business Research
• Understand fundamental statistical
concepts for business data analysis
• Apply appropriate statistical
measures to describe business
phenomena
• Interpret statistical results in
business decision-making contexts
• Develop critical analytical skills
for research and data-driven
decision making
What is Descriptive Statistics?
• Definition: Methods used to summarize,
organize, and simplify data.
• Purpose: Transforms raw data into
meaningful information.
• Role in Business Research:
• Foundation for data analysis and
interpretation.
• Provides clear summaries of complex
business data.
• Enables pattern identification and trend
analysis.
• Forms basis for more advanced
statistical analysis.
Types of Descriptive Statistics
Measures of Central Tendency: Identify the center of a data
distribution.
Measures of Dispersion: Describe the spread or variability of
data.
Measures of Distribution Shape: Describe asymmetry and
peakedness.
Measures of Relationship: Quantify connections between
variables.
• Definition: Statistical measures that
identify the center or middle of a data
set.
• Importance in Business:
• Provides representative values for
Introduction to business metrics.
• Allows comparison between
Central Tendency different data sets.
• Simplifies complex data for
decision-making.
• Common Measures: Mean, Median,
Mode
• Definition: Arithmetic average of all values in a
data set
• Formula: x̄ = Σfixi / Σfi.
• Business Applications:
Mean • Average sales figures
• Average customer spending
• Average production costs
• Strengths: Uses all data points, mathematically
precise
• Limitations: Sensitive to outliers and extreme
values
Mean - Examples
and Practice
• Example 1: Monthly sales figures for Q1
($000s): $120, $145, $130
• Mean = ($120 + $145 + $130)/3 = $131.67
• Example 2: Customer wait times (minutes): 3,
5, 8, 12, 7
• Mean = (3 + 5 + 8 + 12 + 7)/5 = 7 minutes
• Practice Problem: Calculate mean employee
productivity scores: 78, 82, 95, 67, 88, 91
• Definition: Middle value when data is arranged in ascending or
descending order.
• Calculation:
• For odd number of observations: middle value
• For even number: average of two middle values
Median
• Business Applications:
• Median income of target market
• Median product prices
• Median response times
• Strengths: Not affected by extreme values, suitable for ordinal
data
• Limitations: Ignores actual values of most observations
Median - Examples and Practice
• Example 1: House prices ($000s): $350, $280, $420, $315, $550
• Arranged: $280, $315, $350, $420, $550
• Median = $350
• Example 2: Employee tenure (years): 2, 8, 5, 10, 12, 4
• Arranged: 2, 4, 5, 8, 10, 12
• Median = (5 + 8)/2 = 6.5
• Practice Problem: Find median customer satisfaction rating: 3, 5,
2, 4, 3, 5, 1
• Definition: Most frequently occurring value in a data
set
• Characteristics:
• May have multiple modes (bimodal, multimodal)
Mode
• May not exist if no values repeat
• Business Applications:
• Most common product size purchased
• Most frequent customer complaints
• Most common price point
• Strengths: Works with nominal data, identifies most
common value
• Limitations: May not be representative of the data set
Mode - Examples and Practice
• Example 1: Product ratings (1-5 scale): 4, 3, 5, 4,
2, 4, 5
• Mode = 4 (occurs three times)
• Example 2: Customer age groups: 20s, 30s, 40s,
30s, 20s, 30s, 50s
• Mode = 30s (occurs three times)
• Practice Problem: Identify mode in marketing
channel conversions: Email, Social, Email, Direct,
Search, Social, Email
When to Use Mean:
• Data is approximately normally distributed
• No significant outliers
Comparing • Interval or ratio data
Measures of When to Use Median:
Central • Data is skewed
Tendency • Outliers present
• Ordinal data
When to Use Mode:
• Nominal data
• Interest in most common category
• Multimodal distributions for segment identification
• Definition: Measures how spread
out or scattered the data values
are.
• Importance in Business:
• Indicates data reliability and
Introduction consistency
to Dispersion • Reveals variability in business
metrics
• Helps assess risk and
uncertainty
• Common Measures: Range,
Variance, Standard Deviation
• Definition: Difference between the maximum and
minimum values.
• Formula: Range = Maximum value - Minimum value
• Business Applications:
Range • Price ranges in market analysis
• Production output variability
• Customer spending range
• Strengths: Simple to calculate and understand
• Limitations: Uses only two data points, ignores
distribution
• Example 1: Monthly revenue ($000s): $80,
$95, $110, $88, $92
Range - • Range = $110 - $80 = $30
Examples • Example 2: Product defect rates (%): 2.1, 1.8,
and Practice 3.2, 2.5, 1.9
• Range = 3.2 - 1.8 = 1.4
• Practice Problem: Calculate range of delivery
times (days): 3, 7, 2, 5, 8, 4
• Definition: Average of squared deviations from
the mean
• Formula: σ² = ∑ (xi - μ)² / N (population)
• Sample Variance: s² = ∑ (xi - x̄)² / (n - 1) (sample)
Variance • Business Applications:
• Analyzing variability in financial returns
• Quality control measurements
• Assessing consistency in performance metrics
• Limitations: Not in same units as original data
• Calculate the mean of the data set
• Subtract the mean from each data point
• Square each deviation
• Sum the squared deviations
Variance - • Divide by n (population) or n-1 (sample)
• Example: Customer wait times (minutes): 5,
Calculation 8, 4, 10, 7
• Mean = 6.8 minutes
Steps • Deviations: -1.8, 1.2, -2.8, 3.2, 0.2
• Squared deviations: 3.24, 1.44, 7.84, 10.24,
0.04
• Sum of squared deviations = 22.8
• Sample variance = 22.8/4 = 5.7
Standard
Deviation
• Definition: Square root of variance, measure of
average deviation
• Business Applications:
• Measuring volatility in stock prices
• Evaluating consistency in production
• Quantifying risk in business metrics
• Advantages: Same units as original data, widely
used in analysis
Standard Deviation - Examples
• Example 1: From previous slide, wait times standard deviation =
√5.7 = 2.39 minutes
• Example 2: Sales performance (units): 45, 52, 49, 38, 56
• Mean = 48
• Sample standard deviation = 6.96 units
• Interpretation: Approximately 68% of values fall within ±1 standard
deviation of the mean
• Definition: Standardized measure of
dispersion relative to the mean
• Formula: CV=s/x×100%
• Business Applications:
• Comparing variability between different
Coefficient of data sets
Variation • Assessing relative risk between
investments
• Comparing consistency across different
business units
• Advantage: Allows comparison of dispersion
across different scales
Definition: Characteristics describing the
form of a probability distribution
Distribution Key Measures:
Shape - • Skewness: Asymmetry of the distribution
Overview • Kurtosis: Peakedness and tail heaviness
Business Importance:
• Informs appropriate statistical tests
• Reveals underlying data patterns
• Guides data transformation decisions
Definition: Measure of asymmetry in a
distribution
• Formula:
Types:
• Positive skew (right skew): longer tail to the right
Skewness • Negative skew (left skew): longer tail to the left
• Zero skew: symmetric distribution
Interpretation:
• Positive: Mean > Median > Mode
• Negative: Mean < Median < Mode
• Zero: Mean = Median = Mode
Skewness - Visual Representation
• Positively Skewed
Distributions:
• Income distributions
• Property values
• Time to complete tasks
• Negatively Skewed
Distributions:
• Test scores with ceiling effects
• Age at retirement
• Product purity levels
• Definition: Measure of "tailedness" or peakedness of a
distribution
Kurtosis
• Types:
• Leptokurtic (positive): More peaked, heavier tails
• Mesokurtic (zero): Normal distribution
• Platykurtic (negative): Flatter, lighter tails
Kurtosis - Business Implications
• Leptokurtic Distributions:
• Financial returns during market volatility
• Customer response times with outliers
• Platykurtic Distributions:
• Evenly distributed sales across product
lines
• Uniform quality control measurements
• Business Impact:
• Risk assessment and management
• Identifying unusual patterns
• Validating statistical assumptions
• Definition: Statistical measure indicating
Introduction direction and strength of relationship
between variables
to • Key Characteristics:
• Direction (positive/negative)
Correlation • Strength (weak/moderate/strong)
• Linear vs. nonlinear
• Business Applications:
• Marketing effectiveness analysis
• Financial variable relationships
• Operational performance factors
Pearson Correlation Coefficient
• Definition: Measures linear relationship between two continuous
variables.
• Range: -1 to +1
• +1: Perfect positive correlation
• 0: No linear correlation
• -1: Perfect negative correlation
Strong Positive (0.7 to 1.0):
• As X increases, Y strongly increases
Moderate Positive (0.3 to 0.7):
Interpreting • As X increases, Y moderately increases
Correlation
Coefficients Weak Positive (0 to 0.3):
• As X increases, Y slightly increases
• Same interpretations for negative correlations (-
0.7 to -1.0, etc.)
• Example: r = 0.85 between advertising spend
and sales indicates strong positive
relationship
Example 1: Advertising Expenditure
vs. Sales
• Data points: (20,40), (25,46), (30,52), (35,58),
(40,65)
• Correlation coefficient: r = 0.99
Correlation • Interpretation: Very strong positive relationship
Examples
Example 2: Price vs. Demand
• Data points: (10,50), (20,40), (30,35), (40,25),
(50,15)
• Correlation coefficient: r = -0.98
• Interpretation: Very strong negative relationship
Correlation: Statistical relationship
between variables
Causation: One variable directly
influences another
Correlation
Important Distinction:
vs. Causation
• Correlation does not imply causation.
• Third variables may create spurious correlations.
• Coincidental relationships can show strong
correlation.
• Business Example: Ice cream sales and drowning
deaths (both caused by summer weather)
Spearman's Rank Correlation:
• For ordinal data or non-linear relationships
• Based on ranks rather than actual values
Other
Point-Biserial Correlation:
Correlation
• Between continuous and binary variables
Measures
Kendall's Tau:
• Non-parametric measure of relationship
• Useful for small sample sizes
• When to use each measure
Definition:
• Deviations between sample statistics and
population parameters
Introduction Types:
to Errors in • Sampling errors: Due to sampling process
Research • Non-sampling errors: All other sources
Business Impact:
• Affects decision quality
• Influences research reliability
• Determines confidence in findings
Definition: Differences between
sample and population due to
random sampling
Sampling Characteristics:
Errors • Naturally occurs in all samples
• Can be statistically estimated
• Decreases with larger sample sizes
• Business Example: Market research
survey with ±3% margin of error
Sample Size:
• Larger samples → smaller sampling errors
Factors Population Variability:
Affecting • More heterogeneous populations → larger sampling
Sampling errors
Error Sampling Fraction:
• Higher percentage of population sampled → smaller
errors
Sampling Design:
• Stratified sampling can reduce error compared to
simple random sampling
• Definition: Errors not attributable to sampling
variation
• Types:
Non- • Coverage errors (frame errors)
Sampling • Measurement errors
• Processing errors
Errors • Non-response errors
• Response errors
• Characteristics: Often more
problematic than sampling errors,
harder to measure
Coverage Errors:
• Target population vs. sampling frame
mismatch
Types of Non- • Example: Online survey excluding non-
internet users
Sampling
Errors - Measurement Errors:
Detailed • Flawed measurement instruments
• Example: Ambiguous survey questions
Non-response Errors:
• Bias from systematic non-participation
• Example: Lower response rates from certain
demographics
Sampling Error Reduction:
• Increase sample size
• Use stratified sampling when appropriate
Minimizing • Ensure random selection within strata
Research Non-sampling Error Reduction:
Errors
• Careful questionnaire design
• Thorough interviewer training
• Multiple contact attempts for non-
respondents
• Data validation and cleaning procedures
• Definition: Number of values free to vary in final calculation of
statistic
• Conceptual Understanding:
• Constraints reduce degrees of freedom
• Related to sample size and parameters estimated
Degrees of • General Formula: df = n - k
• n = sample size
Freedom - • k = number of parameters estimated
Concept • Degrees of Freedom - Examples
• One-Sample t-test: df = n - 1
• One parameter (mean) is estimated
• Independent Samples t-test: df = n₁ + n₂ - 2
• Two parameters (two means) are estimated
• Correlation: df = n - 2
• Two parameters (two means) are estimated
• Business Context: Affects critical values in hypothesis testing
size (n): SE = σ / √n
size (n): SE = σ / √n
• Definition: Standard deviation of a sampling
distribution
• Formulas:
• Standard Error of Mean: SE = σ / √n
Standard • Estimated SE of Mean: SE = s / √n
Error • Importance:
• Measures precision of sample statistics
• Used in confidence interval construction
• Foundation for inferential statistics
Standard Error - Applications
• Confidence Intervals:
• 95% CI = Sample statistic ± 1.96 × Standard Error
• Hypothesis Testing:
• Test statistic = (Sample statistic−Hypothesized value of Standard Error) /
Standard Error
• Business Example:
• SE of $2 in mean customer spending of $45
• Interpretation: High confidence actual mean is within $4 of estimate
Relationship Between Standard Error,
Sample Size and Variability
• Sample Size Effect:
• SE ∝ 1/√n (inversely proportional to square root of sample size)
• Doubling sample size reduces SE by factor of √2
• Population Variability Effect:
• SE ∝ σ (directly proportional to population standard deviation)
• Business Implication: Balancing precision needs with research costs
Definition: Statistical analysis of a single variable
Univariate Purpose:
• Understand distribution characteristics
Analysis - • Identify central tendency and dispersion
• Examine data quality and patterns
Introduction
Business Applications:
• Customer demographic analysis
• Product performance metrics
• Financial indicator assessment
Frequency Distributions:
• Tables showing count/percentage of
observations
Univariate Graphical Methods:
Analysis - • Histograms, bar charts, pie charts
Methods • Box plots, stem-and-leaf plots
Numerical Measures:
• All previously discussed central tendency and
dispersion measures
• Example: Analysis of customer age
distribution in market segment.
Data: Employee satisfaction scores (1-5 scale)
Univariate Numerical Analysis:
Analysis - •
•
Mean: 3.8
Median: 4
Example •
•
Mode: 4
Standard Deviation: 0.9
• Graphical Analysis: [Histogram showing distribution]
• Business Insights: Generally high satisfaction with some
variation
Definition: Statistical analysis examining
relationship between two variables
Purpose:
Bivariate
Analysis - • Determine association between variables
• Identify patterns and relationships
Introduction • Support predictive analysis
Business Applications:
• Price-demand relationships
• Marketing spend vs. sales
• Employee training and productivity
Cross-tabulations: For categorical
variables
Scatter Plots: For continuous
Bivariate variables
Analysis - Correlation Analysis: Pearson,
Methods Spearman, etc.
Simple Regression: Linear
relationship modeling
Contingency Tables: Joint frequency
distributions
Bivariate Analysis - Example
• Variables: Advertising Expenditure
($000s) and Sales ($000s)
• Correlation Analysis: r = 0.92
• Regression Equation: Sales = 120 +
4.5 × Advertising
• Scatter Plot: See fig
• Business Insight: Strong positive
relationship, $1,000 in advertising
associated with $4,500 in sales
Multivariate Analysis:
• Examining three or more variables
Moving simultaneously
• Multiple regression, factor analysis, cluster
Beyond analysis
Bivariate From Description to Inference:
Analysis • Hypothesis testing
• Confidence intervals
• Predictive modeling
• Business Value: More comprehensive
understanding of complex business
phenomena
Marketing:
• Customer segmentation
• Campaign effectiveness analysis
Business • Price sensitivity studies
Applications Operations:
of Descriptive • Quality control monitoring
Statistics • Process capability analysis
• Productivity measurement
Finance:
• Risk assessment
• Investment return analysis
• Cost variance analysis
Central Tendency: Mean, median, mode
represent the center
Dispersion: Range, variance, standard
deviation measure spread
Summary of Distribution Shape: Skewness and kurtosis
Key describe asymmetry and peakedness
Concepts Relationships: Correlation quantifies
associations between variables
Research Quality: Understanding sampling
and non-sampling errors
Analysis Approach: Choosing appropriate
univariate or bivariate methods