0% found this document useful (0 votes)
11 views11 pages

Introduction To Univariate Analysis

Uploaded by

yogidhasoban
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views11 pages

Introduction To Univariate Analysis

Uploaded by

yogidhasoban
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

UNIVARITE ANALYSIS

GROUP 1
310623205122 -M.S Pupesh
310623205133 -Sachin
310723205143 –[Link]
30623205172 - M. Vasanth
310623205180 -S. Yogidha
INTRODUCTION TO UNIVARIATE ANALYSIS

• DEFINITION: ANALYSIS OF A SINGLE VARIABLE IN A DATASET

• Purpose: Understand distribution, central tendency, spread

• Importance: First step in Exploratory Data Analysis (EDA)

• Applications: Data exploration, preprocessing, model preparation


DISTRIBUTION OF VARIABLES

• TYPES OF VARIABLES:

• Categorical (e.g., gender, colors)

• Discrete (e.g., number of students)

• Continuous (e.g., height, weight)

• Distributions show frequency of values

• Skewness: Left - skewed, Right - skewed,

Symmetrical
NUMERICAL SUMMARIES OF LEVEL

1. MEAN – ARITHMETIC AVERAGE

2. Median – middle value, robust to outliers

3. Mode – most frequent value

Helps understand the center of data


NUMERICAL SUMMARIES OF SPREAD

• RANGE = MAX – MIN

• Variance ( σ² ) – average squared deviation

• Standard Deviation ( σ ) – spread around

mean

• Interquartile Range (IQR) – spread of middle

50%

• Importance: Detecting data variability &

outliers
SCALING AND STANDARDIZING

1. DIFFERENT FEATURES MAY HAVE DIFFERENT

UNITS/SCALES

2. Min – Max Scaling: Rescales to [0,1]

3. Z - score Standardization: Mean=0, SD=1

4. Essential for: kNN, clustering, regression


INEQUALITY

• MEASURES OF DATA INEQUALITY

• Lorenz Curve – graphical

representation

• Gini Coefficient: 0 = equality, 1 =

max inequality

• Example: Income distribution


RESIDUALS AND TRANSFORMATIONS

• RESIDUALS = ACTUAL – PREDICTED

VALUES

• Used in regression analysis

• Transformations help with skewed

data

1. Log Transformation

2. Square Root

3. Box - Cox
FUNCTION OF ONE VARIABLE AND SPLINES

FUNCTIONS OF ONE VARIABLE: APPLY F(X) FOR

TRANSFORMATION

• Splines: Piecewise polynomials for

smoothing curves

Applications:

[Link] fitting

[Link] smoothing
CROSS - VALIDATION

• PURPOSE: EVALUATE MODEL PERFORMANCE

• K - fold cross - validation: data split into k parts

• Prevents overfitting

• Relevance in univariate analysis: validate

transformations, scaling effects


CONCLUSION

UNIVARIATE ANALYSIS HELPS US UNDERSTAND THE DISTRIBUTION, CENTER,

AND SPREAD OF A SINGLE VARIABLE, FORMING THE FOUNDATION FOR

DEEPER DATA ANALYSIS AND MODELING.

You might also like