0% found this document useful (0 votes)

52 views6 pages

Data Analysis (Pandas)

The document discusses four methods for creating frequency tables of columns in pandas: 1) using the value_counts() function, 2) using the crosstab() function, 3) using the groupby() and count() functions, and 4) creating a two way frequency table using crosstab(). Examples are provided for each method.

Uploaded by

shweta mishra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

52 views6 pages

Data Analysis (Pandas)

Uploaded by

shweta mishra

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

In order to Create Frequency table of column in pandas python we will be

using value_counts() function. crosstab() function in pandas used to get the

cross table or frequency table. Let’s see how to create frequency matrix or
frequency table of column in pandas.

 Frequency table in pandas python using value_count() function

 Frequency table in pandas python using crosstab() function
 groupby() count function is used to get the frequency count of the
dataframe
 two way frequency table using crosstab() function
 two way frequency of table using proportion / row proportion and
column proportions.
First let’s create a dataframe
import pandas as pd
1 import numpy as np
2
3 data = {'Product':
4 ['Box','Bottles','Pen','Markers','Bottles','Pen','Markers','Bottles','Box
','Markers','Markers','Pen'],
5 'State':['Alaska','California','Texas','North
6 Carolina','California','Texas','Alaska','Texas','North
7 Carolina','Alaska','California','Texas'],
8 'Sales':[14,24,31,12,13,7,9,31,18,16,18,14]}
9
1 df1=[Link](data, columns=['Product','State','Sales'])
0
df1
df1 will be

Get frequency table of column in pandas python: Method 1

Frequency table of column in pandas for State column can be created using
value_counts() as shown below.

1 [Link].value_counts()
So the frequency table will be

Get frequency table of column in pandas python: Method 2

Frequency table of column in pandas for State column can be created using
value_counts() as shown below.

1 df1['State'].value_counts()
So the frequency table will be

Get frequency table of column in pandas python : Method 3

crosstab()
Frequency table of column in pandas for State column can be created using
crosstab() function as shown below. crosstab() function takes up the
column name as argument counts the frequency of occurrence of its values

1 ### frequency table using crosstab()function

2
3 import pandas as pd
4 my_tab = [Link](index=df1["State"], columns="count")
my_tab
5
So the frequency table will be
Get frequency table of column in pandas python : Method
4 Groupby count()
groupby() function takes up the column name as argument followed by
count() function as shown below which is used to get the frequency table of
the column in pandas
1 #### Get frequency table of the column using Groupby count()
2
3 [Link](['State'])['Sales'].count()
so the result with frequency table will be

Two way frequency table using crosstab() function:

Two way Frequency table of column in pandas for “State” column and
“Product” column can be created using crosstab() function as shown below.
crosstab() function takes up the column names “State” to index and
“Product” to column as argument counts the frequency of the cross
tabulations
1 ### frequency table using crosstab()function
2
import pandas as pd
3
my_crosstab = [Link](index=df1["State"],
4
5 columns=df1["Product"],
margins=True) # Include row and column
6 totals
7 my_crosstab
So the resultant two way frequency table will be

How to Calculate Conditional Probability

in Python

The conditional probability that event A occurs,

given that event B has occurred, is calculated as
follows:
P(A|B) = P(A∩B) / P(B)
where:
P(A∩B) = the probability that event A and
event B both occur.
P(B) = the probability that event B occurs.
The following example shows how to use this
formula to calculate conditional probabilities in
Python.
Example: Calculate Conditional Probability in Python
Suppose we send out a survey to 300 individuals
asking them which sport they like best: baseball,
basketball, football, or soccer.
We can create the following table in Python to
hold the survey responses:
import pandas as pd
import numpy as np

#create pandas DataFrame with raw data

df = [Link]({'gender': [Link]([Link](['Male', 'Female']), 150),
'sport': [Link]([Link](['Baseball', 'Basketball', 'Football',
'Soccer', 'Baseball', 'Basketball',
'Football', 'Soccer']),
(34, 40, 58, 18, 34, 52, 20, 44))})

#produce contingency table to summarize raw data

survey_data = [Link](index=df['gender'], columns=df['sport'], margins=True)

#view contingency table

survey_data

sport Baseball Basketball Football Soccer All

gender
Female 34 52 20 44 150
Male 34 40 58 18 150
All 68 92 78 62 300

How to get the correlation between two

columns in Pandas?
We can use the .corr() method to get the correlation between two columns
in Pandas. Let's take an example and see how to apply this method.
Steps
 Create a two-dimensional, size-mutable, potentially heterogeneous tabular data, df.
 Print the input DataFrame, df.
 Initialize two variables, col1 and col2, and assign them the columns that you want to find
the correlation of.
 Find the correlation between col1 and col2 by using df[col1].corr(df[col2]) and
save the correlation value in a variable, corr.
 Print the correlation value, corr.

Example
import pandas as pd
df = [Link](
{
"x": [5, 2, 7, 0],
"y": [4, 7, 5, 1],
"z": [9, 3, 5, 1]
}
)
print "Input DataFrame is:\n", df

col1, col2 = "x", "y"

corr = df[col1].corr(df[col2])
print "Correlation between ", col1, " and ", col2, "is: ", round(corr, 2)

col1, col2 = "x", "x"

corr = df[col1].corr(df[col2])
print "Correlation between ", col1, " and ", col2, "is: ", round(corr, 2)

col1, col2 = "x", "z"

corr = df[col1].corr(df[col2])
print "Correlation between ", col1, " and ", col2, "is: ", round(corr, 2)

col1, col2 = "y", "x"

corr = df[col1].corr(df[col2])
print "Correlation between ", col1, " and ", col2, "is: ", round(corr, 2)
Output
Input DataFrame is:
xyz
0549
1273
2755
3011
Correlation between x and y is: 0.41
Correlation between x and x is: 1.0
Correlation between x and z is: 0.72
Correlation between y and x is: 0.41

Pandas Plots
No ratings yet
Pandas Plots
14 pages
347 862932 Datawrangling
No ratings yet
347 862932 Datawrangling
17 pages
Python Data Analysis Cheat Sheet
100% (3)
Python Data Analysis Cheat Sheet
9 pages
Module 5 Bivariate Analysis
No ratings yet
Module 5 Bivariate Analysis
81 pages
Pandas Data Analysis and Wrangling Guide
No ratings yet
Pandas Data Analysis and Wrangling Guide
12 pages
EDA+Cheatsheet+ +Class+Note
No ratings yet
EDA+Cheatsheet+ +Class+Note
29 pages
EDA Unit IV
No ratings yet
EDA Unit IV
16 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
Informatics Practices Class 12 Cbse Notes Data Handling
0% (1)
Informatics Practices Class 12 Cbse Notes Data Handling
17 pages
EDA Code Cheatsheet for Data Analysis
No ratings yet
EDA Code Cheatsheet for Data Analysis
29 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
FDS Slot 1
No ratings yet
FDS Slot 1
19 pages
Unit 3 Python B.SC IT
No ratings yet
Unit 3 Python B.SC IT
18 pages
EDA+Cheatsheet+ +Class+Note
No ratings yet
EDA+Cheatsheet+ +Class+Note
29 pages
AD3411 - 1 To 5
No ratings yet
AD3411 - 1 To 5
11 pages
4 PythonPandas
No ratings yet
4 PythonPandas
8 pages
Practical File 2024
No ratings yet
Practical File 2024
25 pages
DS Manual 1
No ratings yet
DS Manual 1
96 pages
Pandas Series and DataFrame Guide
No ratings yet
Pandas Series and DataFrame Guide
98 pages
Lab 13
No ratings yet
Lab 13
5 pages
Ai Tools and Applications-Lab
No ratings yet
Ai Tools and Applications-Lab
33 pages
Ad3411 - Student
No ratings yet
Ad3411 - Student
27 pages
Fds SLOT 2
No ratings yet
Fds SLOT 2
12 pages
Pandas & NumPy Data Analysis Guide
No ratings yet
Pandas & NumPy Data Analysis Guide
11 pages
Easiest Lab Programs
No ratings yet
Easiest Lab Programs
5 pages
Ai&Ml Bail606 ML Lab Manual
No ratings yet
Ai&Ml Bail606 ML Lab Manual
50 pages
Dsbda Assignment 1
No ratings yet
Dsbda Assignment 1
5 pages
Practical File Programs
No ratings yet
Practical File Programs
8 pages
Math 189 HW-1: Data Analysis with Pandas
No ratings yet
Math 189 HW-1: Data Analysis with Pandas
11 pages
Week1-SPT2 Descriptive Statistics
No ratings yet
Week1-SPT2 Descriptive Statistics
8 pages
Unit 5 2
No ratings yet
Unit 5 2
6 pages
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
100% (4)
Python Cheat Sheet: Pandas - Numpy - Sklearn Matplotlib - Seaborn BS4 - Selenium - Scrapy
11 pages
DataFrame Functions in Pandas
No ratings yet
DataFrame Functions in Pandas
12 pages
Data Science Practical Book - Ipynb
No ratings yet
Data Science Practical Book - Ipynb
21 pages
Python Pandas DataFrame Operations
No ratings yet
Python Pandas DataFrame Operations
18 pages
Python Cheat Sheet For Excel Users
100% (2)
Python Cheat Sheet For Excel Users
5 pages
Mayank Chaudhary DEV Practicals
No ratings yet
Mayank Chaudhary DEV Practicals
14 pages
Pandas Data Handling & Visualization Guide
100% (1)
Pandas Data Handling & Visualization Guide
37 pages
Python Pandas Practical Examples
No ratings yet
Python Pandas Practical Examples
15 pages
Unit 5 Descriptive Statistics
No ratings yet
Unit 5 Descriptive Statistics
7 pages
Fha-Pyhton Program Unit 1-4
No ratings yet
Fha-Pyhton Program Unit 1-4
13 pages
EDA Cheatsheet - Class Note
No ratings yet
EDA Cheatsheet - Class Note
29 pages
FDSA Lab Manual
No ratings yet
FDSA Lab Manual
32 pages
Data Science and Analtics Laboratory
No ratings yet
Data Science and Analtics Laboratory
21 pages
Introduction to Pandas DataFrames
100% (1)
Introduction to Pandas DataFrames
21 pages
Image To PDF 22-Jan-2025
No ratings yet
Image To PDF 22-Jan-2025
6 pages
Dsa Lab Record (Ai&Ds)
No ratings yet
Dsa Lab Record (Ai&Ds)
34 pages
Eda Code Snippets
No ratings yet
Eda Code Snippets
17 pages
Aphical Representation
No ratings yet
Aphical Representation
12 pages
Lab Programs 1 To 5
No ratings yet
Lab Programs 1 To 5
12 pages
Exploratory Data Analysis in Python
No ratings yet
Exploratory Data Analysis in Python
17 pages
Pandas+With+Python+ +DATAhill+Solutions
No ratings yet
Pandas+With+Python+ +DATAhill+Solutions
24 pages
EDA Module 3-1
No ratings yet
EDA Module 3-1
40 pages
Data Visualization Using Python
No ratings yet
Data Visualization Using Python
3 pages
EDA Lab Manual
No ratings yet
EDA Lab Manual
93 pages
EDA Lab Manual
100% (2)
EDA Lab Manual
93 pages
Python Data Science Lab Manual
No ratings yet
Python Data Science Lab Manual
21 pages
Info Practical
No ratings yet
Info Practical
56 pages
Time Series Analysis Group 9
No ratings yet
Time Series Analysis Group 9
16 pages

Data Analysis (Pandas)

Uploaded by

Data Analysis (Pandas)

Uploaded by

In order to Create Frequency table of column in pandas python we will be

using value_counts() function. crosstab() function in pandas used to get the

 Frequency table in pandas python using value_count() function

Get frequency table of column in pandas python: Method 1

Get frequency table of column in pandas python: Method 2

Get frequency table of column in pandas python : Method 3

1 ### frequency table using crosstab()function

Two way frequency table using crosstab() function:

How to Calculate Conditional Probability

The conditional probability that event A occurs,

#create pandas DataFrame with raw data

#produce contingency table to summarize raw data

#view contingency table

sport Baseball Basketball Football Soccer All

How to get the correlation between two

col1, col2 = "x", "y"

col1, col2 = "x", "x"

col1, col2 = "x", "z"

col1, col2 = "y", "x"

You might also like