The evaluation results for DIV2K validation dataset are similar to the mentioned one. But when I tried to evaluate on Set5, Set14 and other benchmark datasets, the results which I got is very low. These results should be more that 36 db of PSNR for x2 scale factor but I got nearly about 31 db of PSNR for x2 scale factor on Set5 dataset.
In your evaluation code, I just replace the DIV2K validation set images with Set5 images.
Can you help me to solve this problem?