0% found this document useful (0 votes)
52 views6 pages

Data Analysis (Pandas)

The document discusses four methods for creating frequency tables of columns in pandas: 1) using the value_counts() function, 2) using the crosstab() function, 3) using the groupby() and count() functions, and 4) creating a two way frequency table using crosstab(). Examples are provided for each method.

Uploaded by

shweta mishra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
52 views6 pages

Data Analysis (Pandas)

The document discusses four methods for creating frequency tables of columns in pandas: 1) using the value_counts() function, 2) using the crosstab() function, 3) using the groupby() and count() functions, and 4) creating a two way frequency table using crosstab(). Examples are provided for each method.

Uploaded by

shweta mishra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd

In order to Create Frequency table of column in pandas python we will be

using value_counts() function. crosstab() function in pandas used to get the


cross table or frequency table. Let’s see how to create frequency matrix or
frequency table of column in pandas.

 Frequency table in pandas python using value_count() function


 Frequency table in pandas python using crosstab() function
 groupby() count function is used to get the frequency count of the
dataframe
 two way frequency table using crosstab() function
 two way frequency of table using proportion / row proportion and
column proportions.
First let’s create a dataframe
import pandas as pd
1 import numpy as np
2
3 data = {'Product':
4 ['Box','Bottles','Pen','Markers','Bottles','Pen','Markers','Bottles','Box
','Markers','Markers','Pen'],
5 'State':['Alaska','California','Texas','North
6 Carolina','California','Texas','Alaska','Texas','North
7 Carolina','Alaska','California','Texas'],
8 'Sales':[14,24,31,12,13,7,9,31,18,16,18,14]}
9
1 df1=[Link](data, columns=['Product','State','Sales'])
0
df1
df1 will be

Get frequency table of column in pandas python: Method 1


Frequency table of column in pandas for State column can be created using
value_counts() as shown below.

1 [Link].value_counts()
So the frequency table will be

Get frequency table of column in pandas python: Method 2


Frequency table of column in pandas for State column can be created using
value_counts() as shown below.

1 df1['State'].value_counts()
So the frequency table will be

Get frequency table of column in pandas python : Method 3


crosstab()
Frequency table of column in pandas for State column can be created using
crosstab() function as shown below. crosstab() function takes up the
column name as argument counts the frequency of occurrence of its values

1 ### frequency table using crosstab()function


2
3 import pandas as pd
4 my_tab = [Link](index=df1["State"], columns="count")
my_tab
5
So the frequency table will be
Get frequency table of column in pandas python : Method
4 Groupby count()
groupby() function takes up the column name as argument followed by
count() function as shown below which is used to get the frequency table of
the column in pandas
1 #### Get frequency table of the column using Groupby count()
2
3 [Link](['State'])['Sales'].count()
so the result with frequency table will be

Two way frequency table using crosstab() function:


Two way Frequency table of column in pandas for “State” column and
“Product” column can be created using crosstab() function as shown below.
crosstab() function takes up the column names “State” to index and
“Product” to column as argument counts the frequency of the cross
tabulations
1 ### frequency table using crosstab()function
2
import pandas as pd
3
my_crosstab = [Link](index=df1["State"],
4
5 columns=df1["Product"],
margins=True) # Include row and column
6 totals
7 my_crosstab
So the resultant two way frequency table will be

How to Calculate Conditional Probability


in Python

The conditional probability that event A occurs,


given that event B has occurred, is calculated as
follows:
P(A|B) = P(A∩B) / P(B)
where:
P(A∩B) = the probability that event A and
event B both occur.
P(B) = the probability that event B occurs.
The following example shows how to use this
formula to calculate conditional probabilities in
Python.
Example: Calculate Conditional Probability in Python
Suppose we send out a survey to 300 individuals
asking them which sport they like best: baseball,
basketball, football, or soccer.
We can create the following table in Python to
hold the survey responses:
import pandas as pd
import numpy as np

#create pandas DataFrame with raw data


df = [Link]({'gender': [Link]([Link](['Male', 'Female']), 150),
'sport': [Link]([Link](['Baseball', 'Basketball', 'Football',
'Soccer', 'Baseball', 'Basketball',
'Football', 'Soccer']),
(34, 40, 58, 18, 34, 52, 20, 44))})

#produce contingency table to summarize raw data


survey_data = [Link](index=df['gender'], columns=df['sport'], margins=True)

#view contingency table


survey_data

sport Baseball Basketball Football Soccer All


gender
Female 34 52 20 44 150
Male 34 40 58 18 150
All 68 92 78 62 300

How to get the correlation between two


columns in Pandas?
We can use the .corr() method to get the correlation between two columns
in Pandas. Let's take an example and see how to apply this method.
Steps
 Create a two-dimensional, size-mutable, potentially heterogeneous tabular data, df.
 Print the input DataFrame, df.
 Initialize two variables, col1 and col2, and assign them the columns that you want to find
the correlation of.
 Find the correlation between col1 and col2 by using df[col1].corr(df[col2]) and
save the correlation value in a variable, corr.
 Print the correlation value, corr.

Example
import pandas as pd
df = [Link](
{
"x": [5, 2, 7, 0],
"y": [4, 7, 5, 1],
"z": [9, 3, 5, 1]
}
)
print "Input DataFrame is:\n", df

col1, col2 = "x", "y"


corr = df[col1].corr(df[col2])
print "Correlation between ", col1, " and ", col2, "is: ", round(corr, 2)

col1, col2 = "x", "x"


corr = df[col1].corr(df[col2])
print "Correlation between ", col1, " and ", col2, "is: ", round(corr, 2)

col1, col2 = "x", "z"


corr = df[col1].corr(df[col2])
print "Correlation between ", col1, " and ", col2, "is: ", round(corr, 2)

col1, col2 = "y", "x"


corr = df[col1].corr(df[col2])
print "Correlation between ", col1, " and ", col2, "is: ", round(corr, 2)
Output
Input DataFrame is:
xyz
0549
1273
2755
3011
Correlation between x and y is: 0.41
Correlation between x and x is: 1.0
Correlation between x and z is: 0.72
Correlation between y and x is: 0.41

You might also like