How to Calculate The Interquartile Range in Python


The interquartile range, often denoted “IQR”, is a way to measure the spread of the middle 50% of a dataset. It is calculated as the difference between the first quartile* (the 25th percentile) and the third quartile (the 75th percentile) of a dataset. 

Fortunately it’s easy to calculate the interquartile range of a dataset in Python using the numpy.percentile() function.

This tutorial shows several examples of how to use this function in practice.

Example 1: Interquartile Range of One Array

The following code shows how to calculate the interquartile range of values in a single array:

import numpy as np

#define array of data
data = np.array([14, 19, 20, 22, 24, 26, 27, 30, 30, 31, 36, 38, 44, 47])

#calculate interquartile range 
q3, q1 = np.percentile(data, [75 ,25])
iqr = q3 - q1

#display interquartile range 
iqr

12.25

The interquartile range of this dataset turns out to be 12.25. This is the spread of the middle 50% of values in this dataset.

Example 2: Interquartile Range of a Data Frame Column

The following code shows how to calculate the interquartile range of a single column in a data frame:

import numpy as np
import pandas as pd

#create data frame
df = pd.DataFrame({'rating': [90, 85, 82, 88, 94, 90, 76, 75, 87, 86],
                   'points': [25, 20, 14, 16, 27, 20, 12, 15, 14, 19],
                   'assists': [5, 7, 7, 8, 5, 7, 6, 9, 9, 5],
                   'rebounds': [11, 8, 10, 6, 6, 9, 6, 10, 10, 7]})

#calculate interquartile range of values in the 'points' column
q75, q25 = np.percentile(df['points'], [75 ,25])
iqr = q75 - q25

#display interquartile range 
iqr

5.75

The interquartile range of values in the points column turns out to be 5.75.

Example 3: Interquartile Range of Multiple Data Frame Columns

The following code shows how to calculate the interquartile range of multiple columns in a data frame at once:

import numpy as np
import pandas as pd

#create data frame
df = pd.DataFrame({'rating': [90, 85, 82, 88, 94, 90, 76, 75, 87, 86],
                   'points': [25, 20, 14, 16, 27, 20, 12, 15, 14, 19],
                   'assists': [5, 7, 7, 8, 5, 7, 6, 9, 9, 5],
                   'rebounds': [11, 8, 10, 6, 6, 9, 6, 10, 10, 7]})

#define function to calculate interquartile range
def find_iqr(x):
  return np.subtract(*np.percentile(x, [75, 25]))

#calculate IQR for 'rating' and 'points' columns
df[['rating', 'points']].apply(find_iqr)

rating    6.75
points    5.75
dtype: float64

#calculate IQR for all columns
df.apply(find_iqr)

rating      6.75
points      5.75
assists     2.50
rebounds    3.75
dtype: float64

Note: We use the pandas.DataFrame.apply() function to calculate the IQR for multiple columns in the data frame above.

Additional Resources

Is the Interquartile Range (IQR) Affected By Outliers?
How to Calculate the Interquartile Range (IQR) in Excel
Interquartile Range Calculator

3 Replies to “How to Calculate The Interquartile Range in Python”

  1. Can this method calculate IQR for fixed rating/points/assists/rebounds? for instances, when rating=90, points=5.75, then calculate IQR. Is it possible to add this condition to “define” function?? Thank you!

  2. Hi Zach,

    sorry to say but I don´t think your code is right. If you cross-check with the “How to Calculate the Interquartile Range (IQR) in Excel” article or the “Interquartile Range Calculator” that are linked under “Additional Resources” you will se that both show different results for your sample datasets.

  3. This method appears to only work for data sets with an even number of elements, because when calculating the first and third quartiles it should exclude the median. This isn’t a problem in even numbered sets because the median isn’t actually an element in the set (it’s the mean of the two middle elements) but in odd numbered sets there is an actual element that is the median which needs to be excluded.
    For example, with the set [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] this method works because the median is 5.5 and so the first half of the data set is [1, 2, 3, 4, 5] and the second half is [6, 7, 8, 9, 10]. In this case this method would correctly determine that Q1 = 3 and Q3 = 8, and so it would correctly calculate the interquartile range as 5.
    However, say the data set was [1, 2, 3, 4, 5, 6, 7, 8, 9]. In this case the first half of the data set would be [1, 2, 3, 4] and the second half would be [6, 7, 8, 9], with 5 excluded. The correct Q1 would be 2.5 and the correct Q3 would be 7.5. However, this method would incorrectly determine that Q1 = 3 and Q3 = 7, because it wouldn’t exclude the median, so it would incorrectly calculate the IQR as 4 rather than 5.

Leave a Reply

Your email address will not be published. Required fields are marked *