Open In App

Convert Covariance Matrix to Correlation Matrix using Python

Last Updated : 30 Sep, 2025
Comments
Improve
Suggest changes
1 Likes
Like
Report

In statistics, covariance measures how variables vary together, while correlation standardizes this relationship to a value between -1 and 1, making it easier to interpret. In this article, we will be discussing relationship between Covariance and Correlation and program our own function for calculating covariance and correlation using python.

Relationship Between Covariance and Correlation

Correlation is just normalized Covariance refer to the formula below:

\text{corr}(x, y) = \frac{\text{cov}(x, y)}{\sigma_x \cdot \sigma_y}

where  \sigma_x, \sigma_y are the standard deviation of x and y respectively. 

Program to convert Covariance to Correlation matrix 

We will use the Iris dataset for demonstration. The goal is to first compute the covariance matrix manually and then convert it to a correlation matrix.

1. Loading and displaying the dataset

Python
import numpy as np
import pandas as pd
data = pd.read_csv("iris.csv")
data.head()

Output

irisDataset

In this example, we exclude the target column (species) since we only want numeric features:

Python
data = dataset.iloc[:, :-1].values

2. Define a Function to Calculate Covariance Between Two Variables

The covariance between two variables x and y measures how much they vary together.

Python
# Calculates covariance between two 1D arrays x and y.
def calcCov(x, y):
    mean_x, mean_y = x.mean(), y.mean()  # Calculate mean of each variable
    n = len(x)                           # Number of observations
    # Compute covariance using the formula: sum((x-mean_x)*(y-mean_y)) / n
    return sum((x - mean_x) * (y - mean_y)) / n

Explanation:

  • x.mean() and y.mean() calculate the average of each variable.
  • The formula multiplies deviations of each observation from the mean and averages them.

3. Compute the Covariance Matrix

We can now build the full covariance matrix for all numeric features.

Python
# Calculates the covariance matrix for the dataset.
def covMat(data):
    rows, cols = data.shape
    cov_matrix = np.zeros((cols, cols))  # Initialize a square matrix
    
    # Fill the covariance matrix
    for i in range(cols):
        for j in range(cols):
            cov_matrix[i][j] = calcCov(data[:, i], data[:, j])
    return cov_matrix

# Compute covariance matrix
covMat(data)

Output

Output_1

Explanation:

  • cov_matrix[i][j] stores covariance between the i-th and j-th feature.
  • The matrix is symmetric because cov(x, y) = cov(y, x)

This manual calculation matches the NumPy function:

Python
np.cov(data, rowvar=False)

rowvar=False ensures columns are treated as features and rows as observations.

Output

Output_2

4. Convert Covariance Matrix to Correlation Matrix

Correlation is simply normalized covariance, dividing by the standard deviations of each variable.

Python
def corrMat(data):
    rows, cols = data.shape
    corr_matrix = np.zeros((cols, cols))
    
    # Compute correlation for each pair of variables
    for i in range(cols):
        for j in range(cols):
            x, y = data[:, i], data[:, j]
            # Normalize covariance by product of standard deviations
            corr_matrix[i][j] = calcCov(x, y) / (x.std() * y.std())
    return corr_matrix

# Compute correlation matrix
corrMat(data)

Output

Output_3

Explanation:

  • For each pair of features (i, j), the covariance is normalized by x.std() * y.std() to get correlation.
  • Values are always between -1 and 1, indicating the strength and direction of the relationship.

5. Verify Using NumPy

Python
np.corrcoef(data, rowvar=False)

Output

Output_4

This gives the same correlation matrix in a single step.


Article Tags :

Explore