In order to Create Frequency table of column in pandas python we will be
using value_counts() function. crosstab() function in pandas used to get the
cross table or frequency table. Let’s see how to create frequency matrix or
frequency table of column in pandas.
Frequency table in pandas python using value_count() function
Frequency table in pandas python using crosstab() function
groupby() count function is used to get the frequency count of the
dataframe
two way frequency table using crosstab() function
two way frequency of table using proportion / row proportion and
column proportions.
First let’s create a dataframe
import pandas as pd
1 import numpy as np
2
3 data = {'Product':
4 ['Box','Bottles','Pen','Markers','Bottles','Pen','Markers','Bottles','Box
','Markers','Markers','Pen'],
5 'State':['Alaska','California','Texas','North
6 Carolina','California','Texas','Alaska','Texas','North
7 Carolina','Alaska','California','Texas'],
8 'Sales':[14,24,31,12,13,7,9,31,18,16,18,14]}
9
1 df1=[Link](data, columns=['Product','State','Sales'])
0
df1
df1 will be
Get frequency table of column in pandas python: Method 1
Frequency table of column in pandas for State column can be created using
value_counts() as shown below.
1 [Link].value_counts()
So the frequency table will be
Get frequency table of column in pandas python: Method 2
Frequency table of column in pandas for State column can be created using
value_counts() as shown below.
1 df1['State'].value_counts()
So the frequency table will be
Get frequency table of column in pandas python : Method 3
crosstab()
Frequency table of column in pandas for State column can be created using
crosstab() function as shown below. crosstab() function takes up the
column name as argument counts the frequency of occurrence of its values
1 ### frequency table using crosstab()function
2
3 import pandas as pd
4 my_tab = [Link](index=df1["State"], columns="count")
my_tab
5
So the frequency table will be
Get frequency table of column in pandas python : Method
4 Groupby count()
groupby() function takes up the column name as argument followed by
count() function as shown below which is used to get the frequency table of
the column in pandas
1 #### Get frequency table of the column using Groupby count()
2
3 [Link](['State'])['Sales'].count()
so the result with frequency table will be
Two way frequency table using crosstab() function:
Two way Frequency table of column in pandas for “State” column and
“Product” column can be created using crosstab() function as shown below.
crosstab() function takes up the column names “State” to index and
“Product” to column as argument counts the frequency of the cross
tabulations
1 ### frequency table using crosstab()function
2
import pandas as pd
3
my_crosstab = [Link](index=df1["State"],
4
5 columns=df1["Product"],
margins=True) # Include row and column
6 totals
7 my_crosstab
So the resultant two way frequency table will be
How to Calculate Conditional Probability
in Python
The conditional probability that event A occurs,
given that event B has occurred, is calculated as
follows:
P(A|B) = P(A∩B) / P(B)
where:
P(A∩B) = the probability that event A and
event B both occur.
P(B) = the probability that event B occurs.
The following example shows how to use this
formula to calculate conditional probabilities in
Python.
Example: Calculate Conditional Probability in Python
Suppose we send out a survey to 300 individuals
asking them which sport they like best: baseball,
basketball, football, or soccer.
We can create the following table in Python to
hold the survey responses:
import pandas as pd
import numpy as np
#create pandas DataFrame with raw data
df = [Link]({'gender': [Link]([Link](['Male', 'Female']), 150),
'sport': [Link]([Link](['Baseball', 'Basketball', 'Football',
'Soccer', 'Baseball', 'Basketball',
'Football', 'Soccer']),
(34, 40, 58, 18, 34, 52, 20, 44))})
#produce contingency table to summarize raw data
survey_data = [Link](index=df['gender'], columns=df['sport'], margins=True)
#view contingency table
survey_data
sport Baseball Basketball Football Soccer All
gender
Female 34 52 20 44 150
Male 34 40 58 18 150
All 68 92 78 62 300
How to get the correlation between two
columns in Pandas?
We can use the .corr() method to get the correlation between two columns
in Pandas. Let's take an example and see how to apply this method.
Steps
Create a two-dimensional, size-mutable, potentially heterogeneous tabular data, df.
Print the input DataFrame, df.
Initialize two variables, col1 and col2, and assign them the columns that you want to find
the correlation of.
Find the correlation between col1 and col2 by using df[col1].corr(df[col2]) and
save the correlation value in a variable, corr.
Print the correlation value, corr.
Example
import pandas as pd
df = [Link](
{
"x": [5, 2, 7, 0],
"y": [4, 7, 5, 1],
"z": [9, 3, 5, 1]
}
)
print "Input DataFrame is:\n", df
col1, col2 = "x", "y"
corr = df[col1].corr(df[col2])
print "Correlation between ", col1, " and ", col2, "is: ", round(corr, 2)
col1, col2 = "x", "x"
corr = df[col1].corr(df[col2])
print "Correlation between ", col1, " and ", col2, "is: ", round(corr, 2)
col1, col2 = "x", "z"
corr = df[col1].corr(df[col2])
print "Correlation between ", col1, " and ", col2, "is: ", round(corr, 2)
col1, col2 = "y", "x"
corr = df[col1].corr(df[col2])
print "Correlation between ", col1, " and ", col2, "is: ", round(corr, 2)
Output
Input DataFrame is:
xyz
0549
1273
2755
3011
Correlation between x and y is: 0.41
Correlation between x and x is: 1.0
Correlation between x and z is: 0.72
Correlation between y and x is: 0.41