2/23/25, 9:18 PM about:blank
Data Analysis with Python
Cheat Sheet: Exploratory Data Analysis
Package/Method Description Code Example
df.corr()
Complete dataframe correlation Correlation matrix created using all the attributes of the dataset.
df[['attribute1','attribute2',...]].corr()
Specific Attribute correlation Correlation matrix created using specific attributes of the dataset.
Create a scatter plot using the data points of the dependent from matlplotlib import pyplot as
Scatter Plot variable along the x-axis and the independent variable along the plt plt.scatter(df[['attribute_1']],df[['attribute_2']])
y-axis.
Uses the dependent and independent variables in a Pandas data import seaborn as sns
Regression Plot frame to create a scatter plot with a generated linear regression sns.regplot(x='attribute_1',y='attribute_2', data=df)
line for the data.
Create a box-and-whisker plot that uses the pandas dataframe, import seaborn as sns
Box plot sns.boxplot(x='attribute_1',y='attribute_2', data=df)
the dependent, and the independent variables.
Create a group of different attributes of a dataset to create a df_group = df[['attribute_1','attribute_2',...]]
Grouping by attributes
subset of the data.
a. Group the data by different categories of an attribute,
displaying the average value of numerical attributes with the a) df_group = df_group.groupby(['attribute_1'],as_index=False).mean()
same category. b) df_group = df_group.groupby(['attribute_1',
GroupBy statements 'attribute_2'],as_index=False).mean()
b. Group the data by different categories of multiple attributes,
displaying the average value of numerical attributes with the
same category.
Create Pivot tables for better representation of data based on grouped_pivot = df_group.pivot(index='attribute_1',columns='attribute_2')
Pivot Tables
parameters
Create a heatmap image using a PsuedoColor plot (or pcolor) from matlplotlib import pyplot as plt
Pseudocolor plot plt.pcolor(grouped_pivot, cmap='RdBu')
using the pivot table as data.
From scipy import stats
Calculate the Pearson Coefficient and p-value of a pair of pearson_coef,p_value=stats.pearsonr(df['attribute_1'],
Pearson Coefficient and p-value
attributes df['attribute_2'])
about:blank 1/1