Fix: change the dtype of self._kernel when input args have a different dtype by MarcBresson · Pull Request #3034 · pytorch/ignite

MarcBresson · 2023-08-21T17:18:28Z

When the SSIM metric is given a float16 (Half) or float64 (Double), it fails because self._kernel will be of the default type.

There is two ways around that:

set the default dtype to Half or Double: torch.set_default_dtype(torch.float16).
change self._kernel to take the dtype of the input tensor.

This pull request put the second option in action. The result is now transparent to the user who can use float16 or float64 on GPU directly.

A new test has been added to ensure of the right behaviour of this change.

Check list:

New tests are added (if a new feature is added)
New doc strings: description and/or example code are in RST format
Documentation is updated (if required)

…t dtype If both dtype do not have the torch default type, F.conv2D will fail because self._kernel is of the default dtype. This solve this issue by switching the dtype of self._kernel

ignite/metrics/ssim.py

It may be clearer this way. But _kernel should always have the pytorch default dtype, so the two solutions are equivalent. Co-authored-by: vfdev <[email protected]>

vfdev-5 · 2023-08-21T17:36:49Z

@MarcBresson thanks for a fix, can you also write a test here: https://github.com/pytorch/ignite/blob/master/tests/ignite/metrics/test_ssim.py, let's create a separate test function like test_ssim_variable_batchsize.

MarcBresson · 2023-08-21T18:08:50Z

Nice idea to write the test ! I just discovered that a few operation are not available in float16 on CPU.

Working on a fix.

MarcBresson · 2023-08-22T07:41:14Z

Just need a little help, how to exclude a test from being tested on CPU? I don't see any exclusion pattern in run_cpu_tests.sh

And by the way, how is it possible that some tests failed while I haven't touched a thing yet in this remote repo? ☺️

vfdev-5 · 2023-08-22T07:49:35Z

Just need a little help, how to exclude a test from being tested on CPU? I don't see any exclusion pattern in run_cpu_tests.sh

I think the simpliest way to do that is just call pytest:

pytest -vvv tests/ignite/metrics/test_ssim.py -k <pattern>

And by the way, how is it possible that some tests failed while I haven't touched a thing yet in this remote repo?

GH CI is rather flaky so some jobs are failing due to issues with distributed configs for the tests. Do not worry about that.

Some functions used in SSIM.update do not exist on CPU and half type, so CPU tests are excluded.

tests/ignite/metrics/test_ssim.py

ignite/metrics/ssim.py

MarcBresson · 2023-08-22T09:34:03Z

Here is a little analysis of the difference between ignite and scikit-image:

code sample:

import torch
from ignite.metrics import SSIM

import tqdm

from skimage.metrics import structural_similarity as ski_ssim
import numpy as np

from matplotlib import pyplot as plt

def test_ssim(available_device, shape, dtype):
    y_pred = torch.rand(shape, device=available_device, dtype=dtype)
    y = y_pred * 0.8

    sigma = 1.5
    data_range = 1.0
    ssim = SSIM(data_range=data_range, sigma=sigma, device=available_device)
    ssim.update((y_pred, y))
    ignite_ssim = ssim.compute()

    if y_pred.dtype == torch.bfloat16:
        y_pred = y_pred.to(dtype=torch.float16)

    skimg_pred = y_pred.cpu().numpy()
    skimg_y = skimg_pred * 0.8
    skimg_ssim = ski_ssim(
        skimg_pred,
        skimg_y,
        win_size=y_pred.shape[0]-1,
        sigma=sigma,
        channel_axis=1,
        data_range=data_range,
    )

    difference = ignite_ssim - skimg_ssim
    return difference

diff_values = {}

for device in [torch.device("cpu"), torch.device("cuda")]:
    available_dtypes = [torch.float32, torch.float64]
    if device == torch.device("cuda"):
        available_dtypes.extend((torch.bfloat16, torch.float16))

    for dtype in available_dtypes:

        diff[str(dtype) + " on " + str(device)] = []

        for _ in tqdm.tqdm(range(1000)):
            difference_ssim = test_ssim(device, (12, 3, 28, 28), dtype=dtype)
            diff[str(dtype) + " on " + str(device)].append(difference_ssim)

for key, values in diff.items():
    # they are too fart apart, so we exclude them
    if key in ["torch.float16 on cuda", "torch.bfloat16 on cuda"]:
        continue

    plt.hist(values, bins=10, label=key)

axes = plt.gca()
axes.ticklabel_format(style="scientific", scilimits=(0, 3))

plt.title("SSIM difference between ignite and scikit-image")

plt.xlabel("difference value")
plt.ylabel("frequency")

plt.legend()
plt.show()

vfdev-5 · 2023-08-22T09:37:42Z

@MarcBresson can you run the following to fix formatting issue:

bash ./tests/run_code_style.sh install
bash ./tests/run_code_style.sh fmt

vfdev-5

Looks good to me, thanks for the PR @MarcBresson !

I'll merge it once required CI jobs is done.
By the way, while checking your PR I saw in the actual code that we are doing rather strange things with device of self._kernel. Would you like to tackle this problem?
Basically, self._kernel should respect self._device but there may be a trade-off to run all padding and conv ops on cuda before moving to self._device. We may want to measure that and make an appropriate decision.

MarcBresson · 2023-08-22T12:46:07Z

@vfdev-5 you are welcome! Happy to help.

I think that self._kernel is already on the right device. It originates either from https://github.com/MarcBresson/ignite/blob/master/ignite/metrics/ssim.py#L127 or https://github.com/MarcBresson/ignite/blob/master/ignite/metrics/ssim.py#L116 and they are both created directly on the device.

The .to(device) (https://github.com/MarcBresson/ignite/blob/master/ignite/metrics/ssim.py#L162) originates from the initial SSIM commit (MarcBresson@4a7428c). I think it has just been forgotten to be removed on this commit MarcBresson@f86d42e.

vfdev-5 · 2023-08-22T14:09:38Z

Well, this code:

        if len(self._kernel.shape) < 4:
            self._kernel = self._kernel.expand(channel, 1, -1, -1).to(device=y_pred.device)

seems a bit weird as only kernel with ndims < 4 will have y_pred.device and do we need to change self._kernel device from self._device to y_pred.device...
I think we should handle the device separately from the shape expand...

MarcBresson · 2023-08-27T08:28:33Z

Just a new analysis on dtype. These are the comparisons of the precision between float32 and float16, bfloat16 and float64. The error is not absolute so we can see if a dtype overshoots compared to float32.

import tqdm
import torch
from ignite.metrics import SSIM
from matplotlib import pyplot as plt

def test_ssim_dtype_error(available_device, shape, dtype):
    y_pred_float32 = torch.rand(shape, device=available_device, dtype=torch.float32)
    y_float32 = y_pred_float32 * 0.8

    y_pred_typed = torch.clone(y_pred_float32)
    y_pred_typed = y_pred_typed.to(dtype=dtype)
    y_typed = y_pred_typed * 0.8

    sigma = 1.5
    data_range = 1.0
    ssim = SSIM(data_range=data_range, sigma=sigma, device=available_device)
    ssim.update((y_pred_float32, y_float32))
    ssim_float32 = ssim.compute()

    ssim_ty = SSIM(data_range=data_range, sigma=sigma, device=available_device)
    ssim_ty.update((y_pred_typed, y_typed))
    ssim_typed = ssim_ty.compute()

    difference = ssim_float32 - ssim_typed
    return difference

diff = {}

for device in [torch.device("cuda")]:
    available_dtypes = [torch.float32, torch.float64]
    if device == torch.device("cuda"):
        available_dtypes.extend((torch.bfloat16, torch.float16))

    for dtype in available_dtypes:

        diff[str(dtype) + " on " + str(device)] = []

        for _ in tqdm.tqdm(range(1000)):
            difference_ssim = test_ssim_dtype_error(device, (12, 3, 28, 28), dtype=dtype)
            diff[str(dtype) + " on " + str(device)].append(difference_ssim)

for dtype in diff:
    plt.hist(diff[dtype], bins=50)
    axes = plt.gca()
    axes.ticklabel_format(axis='x', style="scientific", scilimits=(0, 0))

    plt.title(f"SSIM difference between float32 and a {dtype}")

    plt.xlabel("difference value")
    plt.ylabel("frequency")

    plt.show()

Fix: change the dtype of self._kernel when input args have a differen…

b8f25a0

…t dtype If both dtype do not have the torch default type, F.conv2D will fail because self._kernel is of the default dtype. This solve this issue by switching the dtype of self._kernel

github-actions bot added the module: metrics Metrics module label Aug 21, 2023

vfdev-5 reviewed Aug 21, 2023

View reviewed changes

ignite/metrics/ssim.py Outdated Show resolved Hide resolved

refactor: compare to _kernel dtype directly

2f623a1

It may be clearer this way. But _kernel should always have the pytorch default dtype, so the two solutions are equivalent. Co-authored-by: vfdev <[email protected]>

MarcBresson added 2 commits August 22, 2023 10:06

test: add dtype tests for the SSIM metric

b00ce96

Some functions used in SSIM.update do not exist on CPU and half type, so CPU tests are excluded.

refactor: return directly without assertion

b0888c3

vfdev-5 reviewed Aug 22, 2023

View reviewed changes

tests/ignite/metrics/test_ssim.py Outdated Show resolved Hide resolved

vfdev-5 reviewed Aug 22, 2023

View reviewed changes

tests/ignite/metrics/test_ssim.py Outdated Show resolved Hide resolved

MarcBresson added 3 commits August 22, 2023 10:20

refactor: use pytest parametrize to loop through dtypes

249ae2a

doc: precise the available dtypes

1f82cc9

test: CPU and double are valid for the SSIM computation

4cb57b7

vfdev-5 reviewed Aug 22, 2023

View reviewed changes

ignite/metrics/ssim.py Show resolved Hide resolved

vfdev-5 reviewed Aug 22, 2023

View reviewed changes

ignite/metrics/ssim.py Outdated Show resolved Hide resolved

MarcBresson added 2 commits August 22, 2023 11:25

test: add bfloat16 test for the SSIM metric

d709c74

doc: specify support for bfloat16

77acc0f

style: format using the run_code_style script

807ab60

vfdev-5 approved these changes Aug 22, 2023

View reviewed changes

vfdev-5 merged commit b687f1e into pytorch:master Aug 22, 2023

Uh oh!

Conversation

MarcBresson commented Aug 21, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

vfdev-5 commented Aug 21, 2023

Uh oh!

MarcBresson commented Aug 21, 2023

Uh oh!

MarcBresson commented Aug 22, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vfdev-5 commented Aug 22, 2023

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MarcBresson commented Aug 22, 2023

Uh oh!

vfdev-5 commented Aug 22, 2023

Uh oh!

vfdev-5 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

MarcBresson commented Aug 22, 2023

Uh oh!

vfdev-5 commented Aug 22, 2023

Uh oh!

MarcBresson commented Aug 27, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

MarcBresson commented Aug 21, 2023 •

edited

Loading

MarcBresson commented Aug 22, 2023 •

edited

Loading

vfdev-5 left a comment •

edited

Loading