0% found this document useful (0 votes)
17 views2 pages

Data Visualization HW Summary

Uploaded by

jevictoria
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views2 pages

Data Visualization HW Summary

Uploaded by

jevictoria
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd

Data Visualization Homework Summary (HW1 - HW3)

HW2 Summary: Data Visualization

Q1a. Why countplot is not suitable?

- countplot is for categorical data.

- Our data (literacy rate, income) is continuous, so countplot results in overcrowded and unclear plots.

- Use histplot, displot, or kdeplot instead.

Q1b. Plot income distribution using histplot (percentage)

Code:

[Link](data=df, x="inc", stat="percent", bins=30, kde=False)

Observation:

- Most countries have lower GNI. Right-skewed distribution.

Rugplot / KDE / Displot Overview

- rugplot: Marks data locations

- kdeplot: Smooth curve estimating distribution

- displot: Combines hist, rug, kde in one

Q1c. Log Transform of Income

Code:

log_inc = np.log10(df["inc"])

[Link](log_inc, kde=True, stat="density")

Observation:

- Symmetrizes right-skewed data

- Stabilizes variance for analysis

Literacy Rate Transformation (Left-skewed to Symmetric)

Code:

[Link](x=df['lit']**4, kde=True, stat='density')

Observation:

- Raises to power helps symmetrize left-skewed data


Data Visualization Homework Summary (HW1 - HW3)

Q1d. Scatterplots: Literacy vs Income

1. Raw values: [Link](x="lit", y="inc")

2. Log-transformed income: [Link](x="lit", y=log10("inc"))

3. Squared literacy & log income: [Link](x="lit_squared", y="log_inc")

Observation:

- Transforming variables can linearize relationship

- Makes trend easier to interpret and model

You might also like