Fix: change the dtype of self._kernel when input args have a different dtype#3034
Fix: change the dtype of self._kernel when input args have a different dtype#3034vfdev-5 merged 10 commits intopytorch:masterfrom MarcBresson:master
Conversation
…t dtype If both dtype do not have the torch default type, F.conv2D will fail because self._kernel is of the default dtype. This solve this issue by switching the dtype of self._kernel
It may be clearer this way. But _kernel should always have the pytorch default dtype, so the two solutions are equivalent. Co-authored-by: vfdev <[email protected]>
|
@MarcBresson thanks for a fix, can you also write a test here: https://github.com/pytorch/ignite/blob/master/tests/ignite/metrics/test_ssim.py, let's create a separate test function like |
|
Nice idea to write the test ! I just discovered that a few operation are not available in float16 on CPU. Working on a fix. |
|
Just need a little help, how to exclude a test from being tested on CPU? I don't see any exclusion pattern in And by the way, how is it possible that some tests failed while I haven't touched a thing yet in this remote repo? |
I think the simpliest way to do that is just call pytest:
GH CI is rather flaky so some jobs are failing due to issues with distributed configs for the tests. Do not worry about that. |
Some functions used in SSIM.update do not exist on CPU and half type, so CPU tests are excluded.
|
Here is a little analysis of the difference between ignite and scikit-image: code sample: import torch
from ignite.metrics import SSIM
import tqdm
from skimage.metrics import structural_similarity as ski_ssim
import numpy as np
from matplotlib import pyplot as pltdef test_ssim(available_device, shape, dtype):
y_pred = torch.rand(shape, device=available_device, dtype=dtype)
y = y_pred * 0.8
sigma = 1.5
data_range = 1.0
ssim = SSIM(data_range=data_range, sigma=sigma, device=available_device)
ssim.update((y_pred, y))
ignite_ssim = ssim.compute()
if y_pred.dtype == torch.bfloat16:
y_pred = y_pred.to(dtype=torch.float16)
skimg_pred = y_pred.cpu().numpy()
skimg_y = skimg_pred * 0.8
skimg_ssim = ski_ssim(
skimg_pred,
skimg_y,
win_size=y_pred.shape[0]-1,
sigma=sigma,
channel_axis=1,
data_range=data_range,
)
difference = ignite_ssim - skimg_ssim
return difference
diff_values = {}
for device in [torch.device("cpu"), torch.device("cuda")]:
available_dtypes = [torch.float32, torch.float64]
if device == torch.device("cuda"):
available_dtypes.extend((torch.bfloat16, torch.float16))
for dtype in available_dtypes:
diff[str(dtype) + " on " + str(device)] = []
for _ in tqdm.tqdm(range(1000)):
difference_ssim = test_ssim(device, (12, 3, 28, 28), dtype=dtype)
diff[str(dtype) + " on " + str(device)].append(difference_ssim)for key, values in diff.items():
# they are too fart apart, so we exclude them
if key in ["torch.float16 on cuda", "torch.bfloat16 on cuda"]:
continue
plt.hist(values, bins=10, label=key)
axes = plt.gca()
axes.ticklabel_format(style="scientific", scilimits=(0, 3))
plt.title("SSIM difference between ignite and scikit-image")
plt.xlabel("difference value")
plt.ylabel("frequency")
plt.legend()
plt.show() |
|
@MarcBresson can you run the following to fix formatting issue: |
There was a problem hiding this comment.
Looks good to me, thanks for the PR @MarcBresson !
I'll merge it once required CI jobs is done.
By the way, while checking your PR I saw in the actual code that we are doing rather strange things with device of self._kernel. Would you like to tackle this problem?
Basically, self._kernel should respect self._device but there may be a trade-off to run all padding and conv ops on cuda before moving to self._device. We may want to measure that and make an appropriate decision.
|
@vfdev-5 you are welcome! Happy to help. I think that The |
|
Well, this code: seems a bit weird as only kernel with ndims < 4 will have y_pred.device and do we need to change |
|
Just a new analysis on dtype. These are the comparisons of the precision between float32 and float16, bfloat16 and float64. The error is not absolute so we can see if a dtype overshoots compared to float32. import tqdm
import torch
from ignite.metrics import SSIM
from matplotlib import pyplot as pltdef test_ssim_dtype_error(available_device, shape, dtype):
y_pred_float32 = torch.rand(shape, device=available_device, dtype=torch.float32)
y_float32 = y_pred_float32 * 0.8
y_pred_typed = torch.clone(y_pred_float32)
y_pred_typed = y_pred_typed.to(dtype=dtype)
y_typed = y_pred_typed * 0.8
sigma = 1.5
data_range = 1.0
ssim = SSIM(data_range=data_range, sigma=sigma, device=available_device)
ssim.update((y_pred_float32, y_float32))
ssim_float32 = ssim.compute()
ssim_ty = SSIM(data_range=data_range, sigma=sigma, device=available_device)
ssim_ty.update((y_pred_typed, y_typed))
ssim_typed = ssim_ty.compute()
difference = ssim_float32 - ssim_typed
return difference
diff = {}
for device in [torch.device("cuda")]:
available_dtypes = [torch.float32, torch.float64]
if device == torch.device("cuda"):
available_dtypes.extend((torch.bfloat16, torch.float16))
for dtype in available_dtypes:
diff[str(dtype) + " on " + str(device)] = []
for _ in tqdm.tqdm(range(1000)):
difference_ssim = test_ssim_dtype_error(device, (12, 3, 28, 28), dtype=dtype)
diff[str(dtype) + " on " + str(device)].append(difference_ssim)for dtype in diff:
plt.hist(diff[dtype], bins=50)
axes = plt.gca()
axes.ticklabel_format(axis='x', style="scientific", scilimits=(0, 0))
plt.title(f"SSIM difference between float32 and a {dtype}")
plt.xlabel("difference value")
plt.ylabel("frequency")
plt.show() |





When the SSIM metric is given a float16 (Half) or float64 (Double), it fails because self._kernel will be of the default type.
There is two ways around that:
torch.set_default_dtype(torch.float16).This pull request put the second option in action. The result is now transparent to the user who can use float16 or float64 on GPU directly.
A new test has been added to ensure of the right behaviour of this change.
Check list: