Increasing code coverage by mocking datasets #490

jscanvic · 2025-05-14T17:39:08Z

Exploratory pull request towards solving #487

Data that cannot be loaded onto the CI servers can be mocked using unittest.mock.patch.

Code used for testing should ideally only test what the implementation does, and not how it does it. This is to avoid brittleness and testing the wrong thing. Mocking is an exception to this rule as, by its very nature, it is meant to alter the implementation by changing what the subroutines it calls do. This means that mocking code is necessarily coupled with the implementation which is okay but it should ideally be kept to a minimum.

From what I can tell, the best way to write tests with mocked data might be to write tests that would pass for the actual data, and then mock the data to make them pass, and not to start by mocking the data before cobbling implementation-dependent tests around it. In the case of DeepInverse, we can probably play with pytest fixtures which are currently used (in part) for downloading datasets before testing them. At least some of those tests are currently marked as skipped (test_load_Kohler_dataset, test_load_set14_dataset) because the corresponding datasets are too big to be downloaded on CI servers. A good way to proceed might be to do the mocking there, by mocking the datasets instead of downloading them, ideally with some automated logic to determine if mocking is necessary or not.

Coverage

deepinv.datasets.lidc_idri 31% ➡️ 80%
deepinv.datasets.set14 31% ➡️ 56%
deepinv.datasets.kohler 30% ➡️ 79%
deepinv.datasets.fmd 24% ➡️ 67%
deepinv.datasets.lsdir 24% ➡️ 57%
deepinv.datasets.flickr2k 31% ➡️ 56%
deepinv.datasets +13%
Global +1.08%

Bugs found in the process

Mandatory parameter self is currently missing in Kohler.get_blurry_shot and Kohler.select_frame.
The implementation of LidcIdriDataset does not support the minimum version of Python supported by the library.

Checks to be done before submitting your PR

python3 -m pytest tests/ runs successfully.
black . runs successfully.
make html runs successfully (in the docs/ directory).
Updated docstrings related to the changes (as applicable).
Added an entry to the CHANGELOG.rst.

codecov · 2025-05-14T18:13:32Z

Codecov Report

Attention: Patch coverage is 77.50000% with 18 lines in your changes missing coverage. Please review.

Project coverage is 80.80%. Comparing base (a6f819f) to head (81085b6).
Report is 17 commits behind head on main.

✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
deepinv/tests/test_datasets.py	79.22%	10 Missing and 6 partials ⚠️
deepinv/datasets/lidc_idri.py	33.33%	1 Missing and 1 partial ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #490      +/-   ##
==========================================
+ Coverage   78.55%   80.80%   +2.25%     
==========================================
  Files         182      187       +5     
  Lines       15933    16615     +682     
  Branches     2100     2184      +84     
==========================================
+ Hits        12516    13426     +910     
+ Misses       2717     2443     -274     
- Partials      700      746      +46

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Andrewwango · 2025-05-14T22:09:51Z

This is really cool, Jérémy. Looking forward to reviewing this, when you're ready.

jscanvic · 2025-05-15T14:55:42Z

What do you think of this way of doing things? @Andrewwango

Andrewwango · 2025-05-16T11:03:53Z

@jscanvic thanks for doing this hard work. I really agree with this point (and it is not just true of datasets, but also mocking other things e.g. external libraries, maybe even long-running algorithms.)

write tests that would pass for the actual data, and then mock the data to make them pass, and not to start by mocking the data before cobbling implementation-dependent tests around it

It would be worth adding best-practice instructions somewhere in the repo about how we write tests - i.e. write tests for the intended outcome of the code, then writing code to pass these, rather than writing code then writing tests that pass on the code.

AFAIK the remaining TODOs for #487 wrt. datasets are:

Add mocking for all datasets
Add --mock as an option? and then that is default in the CI
Add a note in the dataset docs for "contributing a dataset" that you can mock it but we really trust you to make sure your datasets actually pass before mocking it

jscanvic · 2025-05-16T15:04:23Z

Thanks @Andrewwango

Do we want to mock the datasets that are currently downloaded just fine on the CI server? It is not clear to me. I think all of the datasets are now tested, mocked or not (though coverage can still be increased).

I added an option for mocking using an environment variable - it gets less in the way in my opinion.

I'll update the contributing guide later.

Revert "fix a typo in the tests" This reverts commit aa7af28. Revert "complete coverage of FastMRISliceDataset" This reverts commit e90a819. Revert "complete coverage for SimpleFastMRISliceDataset" This reverts commit 5ef4dba.

Andrewwango · 2025-05-17T16:33:36Z

Mocking other datasets makes it more uniform I guess, but you're right that there's no point to mocking them if we also want to test downloading them.

Andrewwango

Thanks for your hard work especially dealing with the CT dataset, mostly small comments

CHANGELOG.rst

deepinv/datasets/lidc_idri.py

deepinv/tests/test_datasets.py

docs/source/contributing.rst

…_poc

jscanvic · 2025-05-19T18:41:36Z

Thanks for having taken the time to review this PR @Andrewwango!

jscanvic · 2025-05-28T12:51:46Z

Following what @romainvo does in #474, I changed some of the calls to patch to calls to patch.object. The difference between the two lie in the way the patched object is referenced:

patch("os.listdir", return_value=[f"{i}_HR.png" for i in range(1, 15)])
patch.object(os, "listdir", return_value=[f"{i}_HR.png" for i in range(1, 15)])

For bare patch, the object is referenced by its import path, but for patch.object it's referenced through a variable directly. It should make static analysis easier and it should be more convenient with existing tooling, e.g., for using the going to definition/implementation in code editors.

Andrewwango

Responding to previous comments, looks good so far, happy to approve once these have been addressed!

deepinv/datasets/lidc_idri.py

deepinv/tests/test_datasets.py

docs/source/contributing.rst

Co-authored-by: Andrewwango <[email protected]>

jscanvic · 2025-06-14T14:23:08Z

Thanks Andrew! I made the changes

Andrewwango

LGTM

jscanvic added 2 commits May 14, 2025 19:37

poc

b2cbd1d

black

a7e4b01

jscanvic added 9 commits May 14, 2025 20:20

add a non-empty case

3499797

complete coverage for lidc_idri

e7c46b3

mock set14 tests

751b3b7

mock the tests for Kohler

a288a68

fix coverage

d202209

set minimum version for pydicom

dce4d21

revert pydicom version

f50528d

fix lidc idri for python 3.9

15a7737

black

01b9839

jscanvic mentioned this pull request May 14, 2025

Python 3.9 end of life #491

Closed

jscanvic added 10 commits May 15, 2025 10:30

complete the coverage for Kohler

8de57a9

complete the coverage for Kohler

acf26c0

complete coverage of Set14HR

9a6adda

mock FMD tests

0836478

mock tests for LsdirHR

7b35d15

black

5d1b6d6

clean up mess made in test_poc

b6dcf5c

test Flickr2kHR

3c1bca9

factor code into get_dummy_pil_png_image

c1cf523

update changelog

894e39c

jscanvic changed the title ~~Stubbing/mocking proof of concept~~ Increasing code coverage by mocking datasets May 15, 2025

jscanvic marked this pull request as ready for review May 15, 2025 14:53

make mocking controlled by an env variable

e46a0a8

jscanvic added 7 commits May 16, 2025 20:14

add guidelines for contributing new datasets

b32fd90

add links to complementary resources

4ecf13a

complete coverage for SimpleFastMRISliceDataset

5ef4dba

complete coverage of FastMRISliceDataset

e90a819

fix a typo in the tests

aa7af28

revert unsuccessful MRI changes

cd482c0

Revert "fix a typo in the tests" This reverts commit aa7af28. Revert "complete coverage of FastMRISliceDataset" This reverts commit e90a819. Revert "complete coverage for SimpleFastMRISliceDataset" This reverts commit 5ef4dba.

fix title underline too short

4cd0f1d

Andrewwango requested changes May 17, 2025

View reviewed changes

jscanvic added 2 commits May 19, 2025 20:36

skip lidc idri if mocking unspecified

abb628e

Merge remote-tracking branch 'upstream/main' into increasing_coverage…

0a60bf8

…_poc

Andrewwango mentioned this pull request May 20, 2025

Tomography operator with astra-toolbox #474

Merged

5 tasks

change some patch calls to patch.object calls

484fffe

Andrewwango requested changes Jun 13, 2025

View reviewed changes

deepinv/datasets/lidc_idri.py Show resolved Hide resolved

deepinv/tests/test_datasets.py Show resolved Hide resolved

docs/source/contributing.rst Outdated Show resolved Hide resolved

docs/source/contributing.rst Outdated Show resolved Hide resolved

jscanvic and others added 3 commits June 14, 2025 16:12

Update docs/source/contributing.rst

b6f6b0a

Co-authored-by: Andrewwango <[email protected]>

Update docs/source/contributing.rst

4394775

Co-authored-by: Andrewwango <[email protected]>

catch errors due to unsupported pydicom version

0b3d83e

jscanvic added 2 commits June 14, 2025 17:09

dummy change to trigger ci

7d97e85

dummy change to trigger ci

81085b6

Andrewwango approved these changes Jun 16, 2025

View reviewed changes

Andrewwango merged commit fb394fc into deepinv:main Jun 16, 2025
5 checks passed

jscanvic deleted the increasing_coverage_poc branch June 16, 2025 08:57

jscanvic mentioned this pull request Jun 27, 2025

Increasing test coverage #487

Closed

2 tasks

Increasing code coverage by mocking datasets #490

Increasing code coverage by mocking datasets #490

Uh oh!

Conversation

jscanvic commented May 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Checks to be done before submitting your PR

Uh oh!

codecov bot commented May 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Andrewwango commented May 14, 2025

Uh oh!

jscanvic commented May 15, 2025

Uh oh!

Andrewwango commented May 16, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jscanvic commented May 16, 2025

Uh oh!

Andrewwango commented May 17, 2025

Uh oh!

Andrewwango left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jscanvic commented May 19, 2025

Uh oh!

jscanvic commented May 28, 2025

Uh oh!

Andrewwango left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jscanvic commented Jun 14, 2025

Uh oh!

Andrewwango left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jscanvic commented May 14, 2025 •

edited

Loading

codecov bot commented May 14, 2025 •

edited

Loading

Andrewwango commented May 16, 2025 •

edited

Loading