cleaning up data api file cache#737
Conversation
… files from ~/climada/data
|
@chahank: let me know if this acceptably solves the ever increasing api data heap problem. If so I'll add documentation. |
chahank
left a comment
There was a problem hiding this comment.
That was a very quick fix, thanks! Just a few small comments, but otherwise it looks great to me (when the documentation will be added :)).
climada/util/api_client.py
Outdated
| " or remove cache entry from database by calling" | ||
| f" `Client.purge_cache_db(Path('{local_path}'))`!") |
There was a problem hiding this comment.
I think this is a bit to difficult to understand. When should a user purge the cache? When should a user wait for the download? Any way to help the user better here?
There was a problem hiding this comment.
The exception is more verbose now.
climada/util/api_client.py
Outdated
| with the API client, if they are beneath the given directory and if one of the following | ||
| is the case: | ||
| - there status is neither 'active' nor 'test_dataset' | ||
| = their status is 'test_dataset' and keep_testfiles is set to False |
There was a problem hiding this comment.
| = their status is 'test_dataset' and keep_testfiles is set to False | |
| - their status is 'test_dataset' and keep_testfiles is set to False |
climada/util/api_client.py
Outdated
|
|
||
| # collect urls from datasets that should not be removed | ||
| test_datasets = self.list_dataset_infos(status='test_dataset') if keep_testfiles else [] | ||
| test_urls = set(filinf.url for dsinf in test_datasets for filinf in dsinf.files) |
There was a problem hiding this comment.
I do not understand the short variables dsinf and filinf. Are these some standard abbreviations?
There was a problem hiding this comment.
they're called ds_info and file_info now
climada/util/api_client.py
Outdated
| rm_empty_dirs(subdir) | ||
| try: | ||
| directory.rmdir() | ||
| except OSError: |
There was a problem hiding this comment.
What error is ignored here? I do not understand.
There was a problem hiding this comment.
I've added an inline comment # raised when directory is not empty
| Path(temp_dir).joinpath('hazard/tropical_cyclone/rename_files2/v1').is_dir() | ||
| ) | ||
| self.assertEqual( # test files are still there | ||
| 3, |
There was a problem hiding this comment.
that's just the nature of that test dataset. datasets can have any number of files.
There was a problem hiding this comment.
thanks. I meant, why does this particular test result in 3 files?
There was a problem hiding this comment.
Beats me. I picked it for being small (in file size) and expired. I suspect it's an experimental dataset that was used to explore the data api itself.
There was a problem hiding this comment.
Should we use a better-known test file then? The test looks rather mysterious like this.
There was a problem hiding this comment.
Oh. Wrong. Sorry. The one with 3 files is used in TestStormEurope.test_icon_read. Reading icon files takes a directory as input and collects data from there. Having more than one makes complete sense. And the test is fairly known. Apart from that I think it doesn't really matter which dataset we pick as long as size is acceptable and status and version make a difference.
There was a problem hiding this comment.
All right, if it is clear to you, I am fine with it.
|
@chahank many thanks for the equally quick review! 🙌 |
Changes proposed in this PR:
This PR fixes #
PR Author Checklist
develop)PR Reviewer Checklist