Skip to content

More efficient SmartCacheLoader#2181

Merged
wyli merged 8 commits intoProject-MONAI:devfrom
cgrain:CoenGruijt_Patch_efficient_loader
May 12, 2021
Merged

More efficient SmartCacheLoader#2181
wyli merged 8 commits intoProject-MONAI:devfrom
cgrain:CoenGruijt_Patch_efficient_loader

Conversation

@cgrain
Copy link
Copy Markdown
Contributor

@cgrain cgrain commented May 12, 2021

Description

On this website it is advised to delete tensors explicitly (with del) when the program is done with it. Applying that, I have made a small change in SmartCacheLoader that reflects that. I have also added a small unit test that ensures that the updated cache is indeed what it needs to be.

Status

Ready

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).
  • Breaking change (fix or new feature that would cause existing functionality to change).
  • New tests added to cover the changes.
  • Integration tests passed locally by running ./runtests.sh -f -u --net --coverage.
  • Quick tests passed locally by running ./runtests.sh --quick --unittests.
  • In-line docstrings updated.
  • Documentation updated, tested make html command in the docs/ folder.

@rijobro
Copy link
Copy Markdown
Contributor

rijobro commented May 12, 2021

Hi, I read through the blog you linked, and guess you're referring to this paragraph:

Free up memory using del
This is a common pitfall for new PyTorch users, and we think it isn’t documented enough.
After you’re done with some PyTorch tensor or variable, delete it using the python del operator to free up memory.

In the blog post there is no data to back up this claim. As such, I think we'd need to see some numbers that your implementation uses less memory than the current one.

@rijobro
Copy link
Copy Markdown
Contributor

rijobro commented May 12, 2021

Although I agree that the current implementation is a little ugly, it seems to have unnecessary for loops. How about:

self._cache = self._cache[self._replace_num:self._replace_num + remain_num] + self._replacements[:self._replace_num]

@wyli

@cgrain
Copy link
Copy Markdown
Contributor Author

cgrain commented May 12, 2021

Thank you for your quick reply!

Your question for some data is very reasonable, and I have only some anecdotal data unfortunately. I have actually implemented my method in my own project, and instead of SLURM killing my job because I was out of memory, the job ran without problems. Of course, I have not implemented nor tested your method. I will see if I can create a test that shows the difference in data use. The problem to check that thoroughly is that the garbage collector of python (which is not completely determenistic) can either be quick, which will result in a small difference, or slow, which will result in higher memory spikes.

@rijobro
Copy link
Copy Markdown
Contributor

rijobro commented May 12, 2021

Sounds good, I'm curious to see what your benchmarking gives. My worry is that if we can't trust the garbage collector here, then we can't trust it anywhere in the code base and will have to delete everything everywhere. My default position is therefore to trust the garbage collector!

@wyli
Copy link
Copy Markdown
Contributor

wyli commented May 12, 2021

did a quick test it seems test2 is good:

import random

def test1(cache_num=100, replace_num=20):
  _cache = list(range(cache_num))
  replace = [random.random() for x in range(replace_num)]
  remain_num: int = cache_num - replace_num
  for i in range(remain_num):
    _cache[i] = _cache[i + replace_num]
  for i in range(replace_num):
    _cache[remain_num + i] = replace[i]
  return _cache

def test2(cache_num=100, replace_num=20):
  _cache = list(range(cache_num))
  replace = [random.random() for x in range(replace_num)]

  del _cache[:replace_num]
  _cache.extend(replace)
  return _cache

def test3(cache_num=100, replace_num=20):
  _cache = list(range(cache_num))
  replace = [random.random() for x in range(replace_num)]
  remain_num: int = cache_num - replace_num
  _cache = _cache[replace_num:replace_num + remain_num] + replace[:replace_num]
  return _cache

Screenshot 2021-05-12 at 13 45 25

@wyli
Copy link
Copy Markdown
Contributor

wyli commented May 12, 2021

/black
thanks @cgrain, could you address this DCO issue by signing the commits? https://github.com/Project-MONAI/MONAI/pull/2181/checks?check_run_id=2565463758

@cgrain cgrain force-pushed the CoenGruijt_Patch_efficient_loader branch from 1c2507d to 19c6c55 Compare May 12, 2021 14:07
@cgrain
Copy link
Copy Markdown
Contributor Author

cgrain commented May 12, 2021

Thanks @wyli, DCO should be signed now!

@wyli wyli enabled auto-merge (squash) May 12, 2021 17:15
@wyli wyli merged commit 4ef7d22 into Project-MONAI:dev May 12, 2021
yanielc pushed a commit to yanielc/MONAI that referenced this pull request May 13, 2021
* better way of managing Cache

Signed-off-by: Coen <[email protected]>

* Update test_smartcachedataset.py

Signed-off-by: Coen <[email protected]>
Signed-off-by: Yaniel Cabrera <[email protected]>
yanielc pushed a commit to yanielc/MONAI that referenced this pull request May 13, 2021
* better way of managing Cache

Signed-off-by: Coen <[email protected]>

* Update test_smartcachedataset.py

Signed-off-by: Coen <[email protected]>
Signed-off-by: Yaniel Cabrera <[email protected]>
wyli pushed a commit that referenced this pull request May 26, 2021
* better way of managing Cache

Signed-off-by: Coen <[email protected]>

* Update test_smartcachedataset.py

Signed-off-by: Coen <[email protected]>
wyli pushed a commit that referenced this pull request May 26, 2021
* better way of managing Cache

Signed-off-by: Coen <[email protected]>

* Update test_smartcachedataset.py

Signed-off-by: Coen <[email protected]>
wyli pushed a commit that referenced this pull request May 26, 2021
* better way of managing Cache

Signed-off-by: Coen <[email protected]>

* Update test_smartcachedataset.py

Signed-off-by: Coen <[email protected]>
wyli pushed a commit that referenced this pull request May 26, 2021
* better way of managing Cache

Signed-off-by: Coen <[email protected]>

* Update test_smartcachedataset.py

Signed-off-by: Coen <[email protected]>
wyli pushed a commit that referenced this pull request May 26, 2021
* better way of managing Cache

Signed-off-by: Coen <[email protected]>

* Update test_smartcachedataset.py

Signed-off-by: Coen <[email protected]>
wyli pushed a commit that referenced this pull request May 27, 2021
* better way of managing Cache

Signed-off-by: Coen <[email protected]>

* Update test_smartcachedataset.py

Signed-off-by: Coen <[email protected]>
wyli pushed a commit that referenced this pull request May 27, 2021
* better way of managing Cache

Signed-off-by: Coen <[email protected]>

* Update test_smartcachedataset.py

Signed-off-by: Coen <[email protected]>
wyli pushed a commit that referenced this pull request May 27, 2021
* better way of managing Cache

Signed-off-by: Coen <[email protected]>

* Update test_smartcachedataset.py

Signed-off-by: Coen <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants