UNIVARITE ANALYSIS
GROUP 1
310623205122 -M.S Pupesh
310623205133 -Sachin
310723205143 –[Link]
30623205172 - M. Vasanth
310623205180 -S. Yogidha
INTRODUCTION TO UNIVARIATE ANALYSIS
• DEFINITION: ANALYSIS OF A SINGLE VARIABLE IN A DATASET
• Purpose: Understand distribution, central tendency, spread
• Importance: First step in Exploratory Data Analysis (EDA)
• Applications: Data exploration, preprocessing, model preparation
DISTRIBUTION OF VARIABLES
• TYPES OF VARIABLES:
• Categorical (e.g., gender, colors)
• Discrete (e.g., number of students)
• Continuous (e.g., height, weight)
• Distributions show frequency of values
• Skewness: Left - skewed, Right - skewed,
Symmetrical
NUMERICAL SUMMARIES OF LEVEL
1. MEAN – ARITHMETIC AVERAGE
2. Median – middle value, robust to outliers
3. Mode – most frequent value
Helps understand the center of data
NUMERICAL SUMMARIES OF SPREAD
• RANGE = MAX – MIN
• Variance ( σ² ) – average squared deviation
• Standard Deviation ( σ ) – spread around
mean
• Interquartile Range (IQR) – spread of middle
50%
• Importance: Detecting data variability &
outliers
SCALING AND STANDARDIZING
1. DIFFERENT FEATURES MAY HAVE DIFFERENT
UNITS/SCALES
2. Min – Max Scaling: Rescales to [0,1]
3. Z - score Standardization: Mean=0, SD=1
4. Essential for: kNN, clustering, regression
INEQUALITY
• MEASURES OF DATA INEQUALITY
• Lorenz Curve – graphical
representation
• Gini Coefficient: 0 = equality, 1 =
max inequality
• Example: Income distribution
RESIDUALS AND TRANSFORMATIONS
• RESIDUALS = ACTUAL – PREDICTED
VALUES
• Used in regression analysis
• Transformations help with skewed
data
1. Log Transformation
2. Square Root
3. Box - Cox
FUNCTION OF ONE VARIABLE AND SPLINES
FUNCTIONS OF ONE VARIABLE: APPLY F(X) FOR
TRANSFORMATION
• Splines: Piecewise polynomials for
smoothing curves
Applications:
[Link] fitting
[Link] smoothing
CROSS - VALIDATION
• PURPOSE: EVALUATE MODEL PERFORMANCE
• K - fold cross - validation: data split into k parts
• Prevents overfitting
• Relevance in univariate analysis: validate
transformations, scaling effects
CONCLUSION
UNIVARIATE ANALYSIS HELPS US UNDERSTAND THE DISTRIBUTION, CENTER,
AND SPREAD OF A SINGLE VARIABLE, FORMING THE FOUNDATION FOR
DEEPER DATA ANALYSIS AND MODELING.