Skip to content

Anopheles refactor part 1 - AnophelesBase#380

Merged
alimanfoo merged 9 commits intomalariagen:masterfrom
alimanfoo:anopheles-refactor-base-2023-03-30
Mar 31, 2023
Merged

Anopheles refactor part 1 - AnophelesBase#380
alimanfoo merged 9 commits intomalariagen:masterfrom
alimanfoo:anopheles-refactor-base-2023-03-30

Conversation

@alimanfoo
Copy link
Copy Markdown
Member

@alimanfoo alimanfoo commented Mar 31, 2023

This PR begins the work of breaking up the AnophelesDataResource class into a number of smaller classes with related and self-contained functionality, as proposed in #366.

Here a new class AnophelesBase is created, which assumes responsibility for loading release configuration and accessing release manifests (sample sets).

Note also that some new tests are added which specifically test this new class. These new tests use locally-created data which mimics the organisation and format of data in Ag3 and Af1 but is local and smaller to allow for faster test runs.

Also resolves #368 along the way.

site_filters_analysis=None,
pre=False,
**kwargs, # used by simplecache, init_filesystem(url, **kwargs)
**storage_kwargs, # used by fsspec via init_filesystem(url, **kwargs)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename for clarity.

Comment on lines +149 to +151
gcs_url=GCS_URL,
major_version_number=MAJOR_VERSION_INT,
major_version_path=MAJOR_VERSION_GCS_STR,
Copy link
Copy Markdown
Member Author

@alimanfoo alimanfoo Mar 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving these parameters to the super-class constructor makes testing much easier.

Comment on lines +61 to +63
# test duplicates are handled
df_dup = anoph.sample_sets(release=[release, release])
assert_frame_equal(df_sample_sets, df_dup)
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Slight change of behaviour here, seemed more friendly just to handle and dups.

Comment on lines -13 to -17
matrix:
python-version: ['3.10']
poetry-version: ['1.4.1']
os: [ubuntu-latest]
runs-on: ${{ matrix.os }}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking the opportunity to simplify the action, we don't actually need a matrix here.

Comment on lines 14 to -17
python-version: ['3.8', '3.9', '3.10']
poetry-version: ['1.4.1']
os: [ubuntu-latest]
runs-on: ${{ matrix.os }}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Taking the opportunity to simplify the action, keeping only the parameters which are needed in the matrix.

Comment on lines +48 to 74
Install Python 3.8 (current recommended version for local development), e.g.:

```bash
sudo add-apt-repository ppa:deadsnakes/ppa
sudo apt install python3.10 python3.10-venv
sudo apt install python3.8 python3.8-venv
```

Install pipx, e.g.:

```bash
python3.10 -m pip install --user pipx
python3.10 -m pipx ensurepath
python3.8 -m pip install --user pipx
python3.8 -m pipx ensurepath
```

Install [poetry](https://python-poetry.org/docs/#installation), e.g.:

```bash
pipx install poetry==1.4.1 --python=/usr/bin/python3.10
pipx install poetry==1.4.1 --python=/usr/bin/python3.8
```

Create development environment:

```bash
cd malariagen-data-python
poetry use 3.10
poetry use 3.8
poetry install
```
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommending Python 3.8 for local development, because otherwise if we use a later Python we can accidentally try to use features not supported in Python 3.8, which doesn't then crop up until CI fails. Better to fail fast locally.

Comment on lines +32 to +34
# Run a subset of tests first which run quickly to fail fast.
- name: Run fast unit tests
run: poetry run pytest -v tests/anoph
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pulling out a subset of tests here which are guaranteed to run fast, so we fail faster if there are any obvious bugs.

@alimanfoo alimanfoo marked this pull request as ready for review March 31, 2023 12:51
@alimanfoo alimanfoo requested review from leehart and sanjaynagi March 31, 2023 12:55
@alimanfoo alimanfoo mentioned this pull request Mar 31, 2023
24 tasks
@alimanfoo alimanfoo changed the title Anopheles refactor part 1 (base) Anopheles refactor part 1 - AnophelesBase Mar 31, 2023
@alimanfoo
Copy link
Copy Markdown
Member Author

Keen to push ahead here so planning to merge soon if no objections, but please shout if you'd like more time to review.

@alimanfoo alimanfoo merged commit 1285723 into malariagen:master Mar 31, 2023
@alimanfoo alimanfoo deleted the anopheles-refactor-base-2023-03-30 branch March 31, 2023 16:19
@alimanfoo
Copy link
Copy Markdown
Member Author

Thanks @leehart 🙏

@sanjaynagi
Copy link
Copy Markdown
Collaborator

sanjaynagi commented Mar 31, 2023 via email

@sanjaynagi
Copy link
Copy Markdown
Collaborator

sanjaynagi commented Mar 31, 2023 via email

@alimanfoo alimanfoo added the BMGF-001927 Work supported by BMGF grant INV-001927 (MalariaGEN 2019-2024). label Dec 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

BMGF-001927 Work supported by BMGF grant INV-001927 (MalariaGEN 2019-2024).

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Only check client location if we're in colab

3 participants