Data Visualization Homework Summary (HW1 - HW3)
HW2 Summary: Data Visualization
Q1a. Why countplot is not suitable?
- countplot is for categorical data.
- Our data (literacy rate, income) is continuous, so countplot results in overcrowded and unclear plots.
- Use histplot, displot, or kdeplot instead.
Q1b. Plot income distribution using histplot (percentage)
Code:
[Link](data=df, x="inc", stat="percent", bins=30, kde=False)
Observation:
- Most countries have lower GNI. Right-skewed distribution.
Rugplot / KDE / Displot Overview
- rugplot: Marks data locations
- kdeplot: Smooth curve estimating distribution
- displot: Combines hist, rug, kde in one
Q1c. Log Transform of Income
Code:
log_inc = np.log10(df["inc"])
[Link](log_inc, kde=True, stat="density")
Observation:
- Symmetrizes right-skewed data
- Stabilizes variance for analysis
Literacy Rate Transformation (Left-skewed to Symmetric)
Code:
[Link](x=df['lit']**4, kde=True, stat='density')
Observation:
- Raises to power helps symmetrize left-skewed data
Data Visualization Homework Summary (HW1 - HW3)
Q1d. Scatterplots: Literacy vs Income
1. Raw values: [Link](x="lit", y="inc")
2. Log-transformed income: [Link](x="lit", y=log10("inc"))
3. Squared literacy & log income: [Link](x="lit_squared", y="log_inc")
Observation:
- Transforming variables can linearize relationship
- Makes trend easier to interpret and model