0% found this document useful (0 votes)

64 views9 pages

EDA All Functions

The document discusses various exploratory data analysis functions in pandas such as df.head(), df.tail(), df.info(), df.describe(), df.shape, df.columns, and df.dtypes. It provides examples of using each function on a sample DataFrame and describes what each function shows.

Uploaded by

classfunction9

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

64 views9 pages

EDA All Functions

Uploaded by

classfunction9

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

3/25/24, 10:42 PM EDA All Functions

Note Book By Tariq Ahmed (WP:

+923070996076)
1. [Link](n): Returns the first n rows of the DataFrame.
2. [Link](n): Returns the last n rows of the DataFrame.
3. [Link](): Provides information about the DataFrame, including column names, data types,
and non-null value counts.
4. [Link](): Computes various descriptive statistics for numerical columns in the
DataFrame, such as count, mean, standard deviation, and percentiles.
5. [Link]: Returns the dimensions (number of rows and columns) of the DataFrame.
6. [Link]: Returns the column names of the DataFrame.
7. [Link]: Returns the data types of each column in the DataFrame.
8. [Link](): Checks for missing values and returns a DataFrame of the same shape with
True/False values indicating the presence of missing values.
9. [Link](): Removes rows with missing values from the DataFrame.
10. [Link](value): Fills missing values in the DataFrame with a specified value.
11. [Link](by): Groups the DataFrame by one or more columns and returns a GroupBy
object for further aggregation and analysis.
12. df.sort_values(by): Sorts the DataFrame by one or more columns.
13. [Link](df2): Merges two DataFrames based on common columns or indices.
14. df.pivot_table(values, index, columns): Creates a pivot table from the DataFrame,
aggregating values based on specified columns.
15. [Link](func): Applies a function to each element or column of the DataFrame.

import libraries
In [2]: import pandas as pd

To find csv file encoding

In [5]: with open('Diwali Sales [Link]') as f:
print(f)

<_io.TextIOWrapper name='Diwali Sales [Link]' mode='r' encoding='cp1252'>

import Csv file

In [6]: df=pd.read_csv('Diwali Sales [Link]',encoding='cp1252')

localhost:8888/nbconvert/html/Class EDA/EDA All [Link]?download=false 1/9

3/25/24, 10:42 PM EDA All Functions

[Link]() function is used to display the first

few rows of a DataFrame object in pandas,
which is a popular data manipulation and
analysis library.
In [7]: [Link]()

Out[7]: Age
User_ID Cust_name Product_ID Gender Age Marital_Status State Zone O
Group

0 1002903 Sanskriti P00125942 F 26-35 28 0 Maharashtra Western

1 1000732 Kartik P00110942 F 26-35 35 1 Andhra Pradesh Southern

2 1001990 Bindu P00118542 F 26-35 35 1 Uttar Pradesh Central A

3 1001425 Sudevi P00237842 M 0-17 16 0 Karnataka Southern C

4 1000588 Joni P00057942 M 26-35 28 1 Gujarat Western

[Link](), it returns the last five rows of the

DataFrame by default.
In [10]: [Link]()

Out[10]: Age
User_ID Cust_name Product_ID Gender Age Marital_Status State Zone
Group

11246 1000695 Manning P00296942 M 18-25 19 1 Maharashtra Western

11247 1004089 Reichenbach P00171342 M 26-35 33 0 Haryana Northern

Madhya
11248 1001209 Oshin P00201342 F 36-45 40 0 Central
Pradesh

11249 1004023 Noonan P00059442 M 36-45 37 0 Karnataka Southern

11250 1002744 Brumley P00281742 F 18-25 19 0 Maharashtra Western

[Link]() it provides a summary of the

DataFrame, including the following
information:
The total number of rows and columns in the DataFrame. The column names and their
corresponding data types. The count of non-null values in each column. The memory usage of
localhost:8888/nbconvert/html/Class EDA/EDA All [Link]?download=false 2/9
3/25/24, 10:42 PM EDA All Functions

the DataFrame.

In [8]: [Link]()

<class '[Link]'>
RangeIndex: 11251 entries, 0 to 11250
Data columns (total 15 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 User_ID 11251 non-null int64
1 Cust_name 11251 non-null object
2 Product_ID 11251 non-null object
3 Gender 11251 non-null object
4 Age Group 11251 non-null object
5 Age 11251 non-null int64
6 Marital_Status 11251 non-null int64
7 State 11251 non-null object
8 Zone 11251 non-null object
9 Occupation 11251 non-null object
10 Product_Category 11251 non-null object
11 Orders 11251 non-null int64
12 Amount 11239 non-null float64
13 Status 0 non-null float64
14 unnamed1 0 non-null float64
dtypes: float64(3), int64(4), object(8)
memory usage: 1.3+ MB

[Link]() function in pandas is used to

generate descriptive statistics of a
[Link] as count, mean, standard
deviation, minimum value,
In [9]: [Link]()

Out[9]: User_ID Age Marital_Status Orders Amount Status unnamed1

count 1.125100e+04 11251.000000 11251.000000 11251.000000 11239.000000 0.0 0.0

mean 1.003004e+06 35.421207 0.420318 2.489290 9453.610858 NaN NaN

std 1.716125e+03 12.754122 0.493632 1.115047 5222.355869 NaN NaN

min 1.000001e+06 12.000000 0.000000 1.000000 188.000000 NaN NaN

25% 1.001492e+06 27.000000 0.000000 1.500000 5443.000000 NaN NaN

50% 1.003065e+06 33.000000 0.000000 2.000000 8109.000000 NaN NaN

75% 1.004430e+06 43.000000 1.000000 3.000000 12675.000000 NaN NaN

max 1.006040e+06 92.000000 1.000000 4.000000 23952.000000 NaN NaN

localhost:8888/nbconvert/html/Class EDA/EDA All [Link]?download=false 3/9

3/25/24, 10:42 PM EDA All Functions

[Link] (number of rows and columns) of

the DataFrame.
In [11]: [Link]

(11251, 15)
Out[11]:

[Link] Show the column names of the

DataFrame.
In [13]: [Link]

Index(['User_ID', 'Cust_name', 'Product_ID', 'Gender', 'Age Group', 'Age',

Out[13]:
'Marital_Status', 'State', 'Zone', 'Occupation', 'Product_Category',
'Orders', 'Amount', 'Status', 'unnamed1'],
dtype='object')

[Link] shows the data types of each

column in the DataFrame.
In [14]: [Link]

User_ID int64
Out[14]:
Cust_name object
Product_ID object
Gender object
Age Group object
Age int64
Marital_Status int64
State object
Zone object
Occupation object
Product_Category object
Orders int64
Amount float64
Status float64
unnamed1 float64
dtype: object

[Link](): Checks for missing values and

returns a DataFrame of the same shape with
True/False values indicating the presence of
missing values.
In [15]: [Link]()

localhost:8888/nbconvert/html/Class EDA/EDA All [Link]?download=false 4/9

3/25/24, 10:42 PM EDA All Functions

Out[15]: Age
User_ID Cust_name Product_ID Gender Age Marital_Status State Zone Occupatio
Group

0 False False False False False False False False False Fal

1 False False False False False False False False False Fal

2 False False False False False False False False False Fal

3 False False False False False False False False False Fal

4 False False False False False False False False False Fal

... ... ... ... ... ... ... ... ... ...

11246 False False False False False False False False False Fal

11247 False False False False False False False False False Fal

11248 False False False False False False False False False Fal

11249 False False False False False False False False False Fal

11250 False False False False False False False False False Fal

11251 rows × 15 columns

[Link]().sum() Checks for missing values

and count how many nulls are.
In [16]: [Link]().sum()

User_ID 0
Out[16]:
Cust_name 0
Product_ID 0
Gender 0
Age Group 0
Age 0
Marital_Status 0
State 0
Zone 0
Occupation 0
Product_Category 0
Orders 0
Amount 12
Status 11251
unnamed1 11251
dtype: int64

[Link](): Removes rows with missing

values from the DataFrame.
In [17]: [Link]()

localhost:8888/nbconvert/html/Class EDA/EDA All [Link]?download=false 5/9

3/25/24, 10:42 PM EDA All Functions

Out[17]: Age
User_ID Cust_name Product_ID Gender Age Marital_Status State Zone Occupation Pro
Group

[Link]('column name',axis=1,inplace=True)
Removes Missing values from column.
In [23]: [Link]('unnamed1',axis=1,inplace=True)

[Link](value): Fills missing values in the

DataFrame with a specified value.
In [27]: # Fill missing values with a constant value
df_filled = [Link](0)
df_filled

Out[27]: Age
User_ID Cust_name Product_ID Gender Age Marital_Status State Zo
Group

0 1002903 Sanskriti P00125942 F 26-35 28 0 Maharashtra Weste

1 1000732 Kartik P00110942 F 26-35 35 1 Andhra Pradesh Southe

2 1001990 Bindu P00118542 F 26-35 35 1 Uttar Pradesh Cent

3 1001425 Sudevi P00237842 M 0-17 16 0 Karnataka Southe

4 1000588 Joni P00057942 M 26-35 28 1 Gujarat Weste

... ... ... ... ... ... ... ... ...

11246 1000695 Manning P00296942 M 18-25 19 1 Maharashtra Weste

11247 1004089 Reichenbach P00171342 M 26-35 33 0 Haryana Northe

Madhya
11248 1001209 Oshin P00201342 F 36-45 40 0 Cent
Pradesh

11249 1004023 Noonan P00059442 M 36-45 37 0 Karnataka Southe

11250 1002744 Brumley P00281742 F 18-25 19 0 Maharashtra Weste

11251 rows × 14 columns

Fill missing values with the mean of the

column

localhost:8888/nbconvert/html/Class EDA/EDA All [Link]?download=false 6/9

3/25/24, 10:42 PM EDA All Functions

df['column name'].fillna(df['column
name'].mean(),inplace=True)
In [33]: df['Amount'].fillna(df['Amount'].mean(),inplace=True)

In [34]: #check it's fill

[Link]().sum()

User_ID 0
Out[34]:
Cust_name 0
Product_ID 0
Gender 0
Age Group 0
Age 0
Marital_Status 0
State 0
Zone 0
Occupation 0
Product_Category 0
Orders 0
Amount 0
Status 11251
dtype: int64

[Link](by) function in pandas is used to

group a DataFrame by one or more columns.
It allows you to split the DataFrame into
groups based on unique values in the
specified column(s) and perform operations
on each group independently.
In [53]: grouped = [Link](['Product_ID', 'Cust_name'])
mean_age = grouped['Age'].mean()
print(mean_age)

Product_ID Cust_name
P00000142 Adrian 19.0
Akshat 27.0
Armstrong
34.0
Arun 33.0
Atkinson46.0
...
P0099442 Amol 26.0
Astrea 35.0
Grant 32.0
Siddharth 36.0
P0099742 Shatayu 13.0
Name: Age, Length: 10948, dtype: float64

In [54]: # in one line

mean_values = [Link](['Product_ID', 'Cust_name'])['Age'].mean()
mean_values
localhost:8888/nbconvert/html/Class EDA/EDA All [Link]?download=false 7/9
3/25/24, 10:42 PM EDA All Functions

Product_ID Cust_name
Out[54]:
P00000142 Adrian 19.0
Akshat 27.0
Armstrong
34.0
Arun 33.0
Atkinson46.0
...
P0099442 Amol 26.0
Astrea 35.0
Grant 32.0
Siddharth 36.0
P0099742 Shatayu 13.0
Name: Age, Length: 10948, dtype: float64

df.sort_values(by): Sorts the DataFrame by

one or more columns.
In [59]: #df.sort_values(by='Column1') # Sort by a single column

df.sort_values(by='Amount')

Out[59]: Age
User_ID Cust_name Product_ID Gender Age Marital_Status State Zo
Group

11250 1002744 Brumley P00281742 F 18-25 19 0 Maharashtra Weste

11249 1004023 Noonan P00059442 M 36-45 37 0 Karnataka Southe

Madhya
11248 1001209 Oshin P00201342 F 36-45 40 0 Cent
Pradesh

11247 1004089 Reichenbach P00171342 M 26-35 33 0 Haryana Northe

11246 1000695 Manning P00296942 M 18-25 19 1 Maharashtra Weste

... ... ... ... ... ... ... ... ...

4 1000588 Joni P00057942 M 26-35 28 1 Gujarat Weste

3 1001425 Sudevi P00237842 M 0-17 16 0 Karnataka Southe

2 1001990 Bindu P00118542 F 26-35 35 1 Uttar Pradesh Cent

1 1000732 Kartik P00110942 F 26-35 35 1 Andhra Pradesh Southe

0 1002903 Sanskriti P00125942 F 26-35 28 0 Maharashtra Weste

11251 rows × 14 columns

In [56]: # Sort by multiple columns

#df.sort_values(by=['Column1', 'Column2'])
df.sort_values(by=['Age', 'Amount'])

localhost:8888/nbconvert/html/Class EDA/EDA All [Link]?download=false 8/9

3/25/24, 10:42 PM EDA All Functions

Out[56]: Age
User_ID Cust_name Product_ID Gender Age Marital_Status State Zone
Group

11240 1001425 Sudevi P00044742 F 0-17 12 0 Delhi Central

11109 1004135 Jayanti P00229742 F 0-17 12 0 Delhi Central

10804 1001673 Lampkin P00277442 F 0-17 12 0 Gujarat Western

Madhya
10774 1001926 Barton P00157542 M 0-17 12 1 Central
Pradesh

9505 1005403 Caroline P00195742 M 0-17 12 1 Haryana Northern

... ... ... ... ... ... ... ... ... ...

Madhya
2951 1002204 Dilbeck P00246642 M 55+ 92 0 Central
Pradesh

2698 1005658 Poirier P00227942 M 55+ 92 0 Karnataka Southern

Uttar
1106 1001176 Alice P00128942 M 55+ 92 0 Central
Pradesh

Uttar
612 1002526 Shreya P00271142 M 55+ 92 1 Central
Pradesh

359 1003036 Prescott P00255842 F 55+ 92 0 Uttarakhand Central

11251 rows × 14 columns

localhost:8888/nbconvert/html/Class EDA/EDA All [Link]?download=false 9/9

EDA Cheat Sheet - Exploratory Data Analysis
No ratings yet
EDA Cheat Sheet - Exploratory Data Analysis
2 pages
MGNM - 801 - Ca1
No ratings yet
MGNM - 801 - Ca1
14 pages
Chapter 2 - Python Pandas II
No ratings yet
Chapter 2 - Python Pandas II
71 pages
Exploratory Data Analysis Guide
No ratings yet
Exploratory Data Analysis Guide
21 pages
EDA Unit2
No ratings yet
EDA Unit2
99 pages
EDA Project
No ratings yet
EDA Project
7 pages
Untitled0.ipynb - Colab
No ratings yet
Untitled0.ipynb - Colab
6 pages
EDA Unit II
No ratings yet
EDA Unit II
117 pages
Data Engineer Interview 1740985064
No ratings yet
Data Engineer Interview 1740985064
14 pages
EDA Cheat Sheet
No ratings yet
EDA Cheat Sheet
7 pages
Pandas For Machine Learning
No ratings yet
Pandas For Machine Learning
10 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
Prints
No ratings yet
Prints
43 pages
Python
No ratings yet
Python
32 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
20 pages
Data Frame 100 Questions
No ratings yet
Data Frame 100 Questions
16 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
EDA+Cheatsheet+ +Class+Note
No ratings yet
EDA+Cheatsheet+ +Class+Note
29 pages
Understanding df.isnull().sum() in Pandas
No ratings yet
Understanding df.isnull().sum() in Pandas
8 pages
Exp3 Python
No ratings yet
Exp3 Python
15 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
Data Analysis in The Banking Sector: Pandas Fundamentals
No ratings yet
Data Analysis in The Banking Sector: Pandas Fundamentals
16 pages
Dev Lab Record
No ratings yet
Dev Lab Record
21 pages
Pandas Cheat Sheet for Data Manipulation
No ratings yet
Pandas Cheat Sheet for Data Manipulation
1 page
Exploratory Data Analysis (Eda) With Pandas: (Cheatsheet)
No ratings yet
Exploratory Data Analysis (Eda) With Pandas: (Cheatsheet)
7 pages
Data Visualization On Pandas - Jupyter Notebook
No ratings yet
Data Visualization On Pandas - Jupyter Notebook
7 pages
Document (4) - 1
No ratings yet
Document (4) - 1
15 pages
Cheat Sheet - Pandas
No ratings yet
Cheat Sheet - Pandas
6 pages
Exercise 3
No ratings yet
Exercise 3
25 pages
Overview of Pandas DataFrames
No ratings yet
Overview of Pandas DataFrames
21 pages
Diwali Sales Anlaysis
No ratings yet
Diwali Sales Anlaysis
10 pages
Unit IV
No ratings yet
Unit IV
49 pages
Create A Pandas Series From A Dictionary of Values and An Ndarray
No ratings yet
Create A Pandas Series From A Dictionary of Values and An Ndarray
15 pages
Unit7 Working With Pandas - Solved
No ratings yet
Unit7 Working With Pandas - Solved
12 pages
DevOps Session 3 Pandas
No ratings yet
DevOps Session 3 Pandas
33 pages
EDA+Cheatsheet+ +Class+Note
No ratings yet
EDA+Cheatsheet+ +Class+Note
29 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
Chapter 2 Python Pandas - II
No ratings yet
Chapter 2 Python Pandas - II
19 pages
Pandas 1
No ratings yet
Pandas 1
13 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
Lab 9
No ratings yet
Lab 9
9 pages
Introduction to Pandas DataFrames
100% (1)
Introduction to Pandas DataFrames
21 pages
Exp 8 - LM
No ratings yet
Exp 8 - LM
10 pages
Lecture Week5
No ratings yet
Lecture Week5
72 pages
EDA Step by Step
No ratings yet
EDA Step by Step
2 pages
Set 1
No ratings yet
Set 1
16 pages
Exploratory Data Analysis: by Neha Mathur
No ratings yet
Exploratory Data Analysis: by Neha Mathur
14 pages
Pandas
No ratings yet
Pandas
32 pages
DAP 3 Module
No ratings yet
DAP 3 Module
62 pages
Pandas Fuction Notes
No ratings yet
Pandas Fuction Notes
3 pages
Pandas
No ratings yet
Pandas
29 pages
Exploratory Data Analysis: by Neha Mathur
No ratings yet
Exploratory Data Analysis: by Neha Mathur
14 pages
CSV Data Handling Guide
No ratings yet
CSV Data Handling Guide
14 pages
DataFrames Continued
No ratings yet
DataFrames Continued
9 pages
Introduction To Pandas
No ratings yet
Introduction To Pandas
27 pages
Module 3
No ratings yet
Module 3
20 pages
Pandas
No ratings yet
Pandas
30 pages
Ordnance Factory Tiruchirappalli Application Guide
No ratings yet
Ordnance Factory Tiruchirappalli Application Guide
4 pages
COBOL Programming Course 2 Advanced Topics
No ratings yet
COBOL Programming Course 2 Advanced Topics
51 pages
Algorithmic Thinking With Python
No ratings yet
Algorithmic Thinking With Python
19 pages
Auto Print Self Driving Safety Report 2021 Update
No ratings yet
Auto Print Self Driving Safety Report 2021 Update
20 pages
Unit 2
No ratings yet
Unit 2
8 pages
Class 3 Computer Notes-1
No ratings yet
Class 3 Computer Notes-1
4 pages
Pega Certified Senior System Architect Guide
No ratings yet
Pega Certified Senior System Architect Guide
24 pages
Summer Training Report
No ratings yet
Summer Training Report
18 pages
High-Quality Matroska Video File
No ratings yet
High-Quality Matroska Video File
2 pages
Assignment Formatting Instructions
No ratings yet
Assignment Formatting Instructions
20 pages
HP ScanJet N9120FN2 Scanner
No ratings yet
HP ScanJet N9120FN2 Scanner
64 pages
Developer Documentation
No ratings yet
Developer Documentation
5 pages
Managing Modern Desktops MD 101
No ratings yet
Managing Modern Desktops MD 101
9 pages
Computer Failure Solution Report
No ratings yet
Computer Failure Solution Report
24 pages
Mag2600 Pulse Secure
No ratings yet
Mag2600 Pulse Secure
21 pages
SREF v1.1 Exam Study Guide Sep2021
No ratings yet
SREF v1.1 Exam Study Guide Sep2021
82 pages
Cybersecurity Internship Report
No ratings yet
Cybersecurity Internship Report
62 pages
PPDM Lite Architectural Principles
No ratings yet
PPDM Lite Architectural Principles
17 pages
?config Iphone 15 Promax?Format Auto ©by Inspector Gcam
No ratings yet
?config Iphone 15 Promax?Format Auto ©by Inspector Gcam
7 pages
Zeemal Urooj: Computer Science Profile
No ratings yet
Zeemal Urooj: Computer Science Profile
2 pages
zl0933373229 Conf
No ratings yet
zl0933373229 Conf
11 pages
Microsoft Dynamics CRM Course Content
No ratings yet
Microsoft Dynamics CRM Course Content
8 pages
GSK980TC3 Series Bus Turning CNCsystem Programming and Operation User Manual
No ratings yet
GSK980TC3 Series Bus Turning CNCsystem Programming and Operation User Manual
329 pages
D7 Phase 2 Documentation Final
No ratings yet
D7 Phase 2 Documentation Final
47 pages
Daily Dose of Excel: Pop Quiz
No ratings yet
Daily Dose of Excel: Pop Quiz
9 pages
Binary File Handling Assignment for Class XII
No ratings yet
Binary File Handling Assignment for Class XII
5 pages
Vamshi Krishna Embedded System
No ratings yet
Vamshi Krishna Embedded System
5 pages
DevOps Engineer Profile: Ashadullah Shawon
No ratings yet
DevOps Engineer Profile: Ashadullah Shawon
3 pages
Novation K-Station Synth Review
No ratings yet
Novation K-Station Synth Review
3 pages
Getting Started With Oracle SoA 9
No ratings yet
Getting Started With Oracle SoA 9
52 pages