Data Analysis with Python
Cheat Sheet: Exploratory Data Analysis
Package/Method Description Code Example
df.corr()
Complete dataframe Correlation matrix created using all
correlation the attributes of the dataset.
df[['attribute1','attribute2',...]].corr()
Specific Attribute Correlation matrix created using
correlation specific attributes of the dataset.
from matlplotlib import pyplot as plt
plt.scatter(df[['attribute_1']],df[['attribute_2']])
Create a scatter plot using the data
points of the dependent variable along
Scatter Plot
the x-axis and the independent
variable along the y-axis.
import seaborn as sns
sns.regplot(x='attribute_1',y='attribute_2', data=df)
Uses the dependent and independent
variables in a Pandas data frame to
Regression Plot
create a scatter plot with a generated
linear regression line for the data.
import seaborn as sns
sns.boxplot(x='attribute_1',y='attribute_2', data=df)
Create a box-and-whisker plot that
uses the pandas dataframe, the
Box plot
dependent, and the independent
variables.
df_group = df[['attribute_1','attribute_2',...]]
Create a group of different attributes
Grouping by attributes of a dataset to create a subset of the
data.
a) df_group = df.groupby(['attribute_1'],as_index=False).mean()
a. Group the data by different b) df_group = df.groupby(['attribute_1','attribute_2'],as_index=False).mean()
categories of an attribute, displaying
the average value of numerical
attributes with the same category.
GroupBy statements b. Group the data by different
categories of multiple attributes,
displaying the average value of
numerical attributes with the same
category.
grouped_pivot = df_group.pivot(index='attribute_1',columns='attribute_2')
Create Pivot tables for better
Pivot Tables representation of data based on
parameters
from matlplotlib import pyplot as plt
plt.pcolor(grouped_pivot, cmap='RdBu')
Create a heatmap image using a
Pseudocolor plot PsuedoColor plot (or pcolor) using
the pivot table as data.
From scipy import stats
pearson_coef,p_value=stats.pearsonr(df['attribute_1'],df['attribute_2'])
Pearson Coefficient and p- Calculate the Pearson Coefficient and
value p-value of a pair of attributes