0% found this document useful (0 votes)
2 views2 pages

Data Python

This cheat sheet provides methods and code examples for conducting exploratory data analysis using Python. It covers various techniques such as correlation matrices, scatter plots, regression plots, box plots, grouping attributes, and creating pivot tables. Additionally, it includes methods for calculating the Pearson coefficient and p-value for attribute pairs.

Uploaded by

ayushman2258r
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views2 pages

Data Python

This cheat sheet provides methods and code examples for conducting exploratory data analysis using Python. It covers various techniques such as correlation matrices, scatter plots, regression plots, box plots, grouping attributes, and creating pivot tables. Additionally, it includes methods for calculating the Pearson coefficient and p-value for attribute pairs.

Uploaded by

ayushman2258r
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Data Analysis with Python

Cheat Sheet: Exploratory Data Analysis

Package/Method Description Code Example

df.corr()

Complete dataframe Correlation matrix created using all


correlation the attributes of the dataset.

df[['attribute1','attribute2',...]].corr()

Specific Attribute Correlation matrix created using


correlation specific attributes of the dataset.

from matlplotlib import pyplot as plt


plt.scatter(df[['attribute_1']],df[['attribute_2']])

Create a scatter plot using the data


points of the dependent variable along
Scatter Plot
the x-axis and the independent
variable along the y-axis.

import seaborn as sns


sns.regplot(x='attribute_1',y='attribute_2', data=df)

Uses the dependent and independent


variables in a Pandas data frame to
Regression Plot
create a scatter plot with a generated
linear regression line for the data.

import seaborn as sns


sns.boxplot(x='attribute_1',y='attribute_2', data=df)

Create a box-and-whisker plot that


uses the pandas dataframe, the
Box plot
dependent, and the independent
variables.

df_group = df[['attribute_1','attribute_2',...]]

Create a group of different attributes


Grouping by attributes of a dataset to create a subset of the
data.
a) df_group = df.groupby(['attribute_1'],as_index=False).mean()
a. Group the data by different b) df_group = df.groupby(['attribute_1','attribute_2'],as_index=False).mean()
categories of an attribute, displaying
the average value of numerical
attributes with the same category.
GroupBy statements b. Group the data by different
categories of multiple attributes,
displaying the average value of
numerical attributes with the same
category.

grouped_pivot = df_group.pivot(index='attribute_1',columns='attribute_2')

Create Pivot tables for better


Pivot Tables representation of data based on
parameters

from matlplotlib import pyplot as plt


plt.pcolor(grouped_pivot, cmap='RdBu')

Create a heatmap image using a


Pseudocolor plot PsuedoColor plot (or pcolor) using
the pivot table as data.

From scipy import stats


pearson_coef,p_value=stats.pearsonr(df['attribute_1'],df['attribute_2'])

Pearson Coefficient and p- Calculate the Pearson Coefficient and


value p-value of a pair of attributes

You might also like