This repository provides infrastructure tools to run
omop_es and
omop-cascade in containerised environments on the
GAE.
In addition, the prefect/ directory provides data workflow orchestration with
Prefect to allow automatic scheduling of omop_es
runs.
Warning
EXPERIMENTAL
First set the required environment variables:
cp template.env .envand fill out the .env file as needed. The prefect_server takes the following environment
variables:
PREFECT_SERVER_API_HOST: The host for the Prefect server API, typicallylocalhost, or the GAE host name. The default islocalhost.PREFECT_SERVER_API_PORT: The port for the Prefect server API. The default is 4200.
The following 2 variables are required and the prefect_server Docker service will fail to start
without them:
PREFECT_SERVER_API_AUTH_STRING: authentication setting for the process hosting the Prefect serverPREFECT_API_AUTH_STRING: authentication for any client process that needs to communicate with the Prefect API (e.g. when deploying workflows)
These two should match and should be a string with an administrator / password combination,
separated by a colon, e.g. admin:password. With these settings, the dashboard UI will prompt for
the full authentication string (e.g. admin:password) upon first load.
We provide a Makefile to run the relevant deployment commands. Run make help to see the
available commands.
To spin up the server run:
make start-serverYou should now see the dashboard live at http://localhost:4200/dashboard. The port can be configured
through the PREFECT_SERVER_API_PORT environment variable.
To see the Prefect config, including connection details, run:
make configThe Prefect server is running inside a Docker container. You can access the logs for the server by
running (the -f option will continuously stream the logs):
docker compose logs -f prefect_serverBefore deploying Prefect flows, we need an active worker pool. To run any of the deployed flows, we also need a running worker in that pool.
At the moment, we are using simple "process" workers, meaning the worker will run in a subprocess from wherever the worker is started.
To start a worker, run the following command in a different terminal window/session than where you started the server:
make start-workerThis will run in the foreground and block your shell, so run this in a tmux (or similar) session.
You can use this to monitor the logs for any flow that uses this worker (the logs will also show up
in the Prefect dashboard).
Deployments are used to configure how
Prefect flows should be run. Their configuration is
defined in prefect.yaml.
In a third terminal window/session, different than where you started the worker or server, create a deployment by running:
make deployThis will open an interactive shell where you can select a specific flow to deploy.
Note: when prompted by prefect
? Would you like to save configuration for this deployment for faster deployments in the future?,
you typically want to reject by typing n to avoid Prefect overwriting the existing configuration
in prefect.yaml and potentially hardcoding absolute file paths.
Alternatively you can create all deployments defined in prefect.yaml by running:
make deploy-allThis will push the configuration for the Prefect flows to the server and make them ready to run. You should now be able to see any deployments you created at in the Deployments section of the Prefect dashboard.
If any of the deployments have a schedule, they will start running automatically based on their schedule.
To manually trigger a flow run, you can use the dashboard or the Prefect CLI:
uv run prefect deployment run '<deployment_name>'To stop the server:
make stop-serverThis will preserve existing deployments and settings. Workers will also continue to run but will be suspended until the server is restarted.
To take the server down:
make downthis will take down the Docker container running the server. The Prefect database is mounted as a volume and so will be preserved when the server is brought up again.
To reset the prefect database (requires the server to be running):
Warning
This is a destructive operation. It will delete all deployments, flows, tasks, and data stored in the Prefect database. This operation cannot be undone.
make reset-prefect-databaseTo run Prefect on the GAE, make sure to set the following environment variables in the root
directory .env:
# For GAE10, will differ for other GAE instances
PREFECT_SERVER_API_HOST=uclvlddpragae10
PREFECT_SERVER_API_PORT=8082
PREFECT_API_URL=http://uclvlddpragae10:8082/apiThe 4200 port is unavailable on the GAE. With these settings, the dashboard will be hosted at
http://uclvlddpragae10:8082/dashboard (accessible through the UCLH network only).
When running prefect commands (see above) on a GAE, make sure the GAE's address is included in the
NO_PROXY environment variable in the .env file and run uv with the --env-file .env flag, as
prefect doesn't pick up this variable automatically:
NO_PROXY="localhost,127.0.0.1,uclvlddpragae10"When creating Prefect workers, the /tmp/runner_storage directory will be created and owned by
whoever launched the worker. Unfortunately, whenever someone else tries to launch a worker
subsequently, they will get a permission error as they won't have access to /tmp/runner_storage.
To get around this, the original creator of /tmp/runner_storage, should relax the permissions by
running
chmod g+rwx /tmp/runner_storage
chgrp docker /tmp/runner_storageThis will have to be repeated whenever the /tmp/runner_storage gets removed an recreated.
Use docker compose build to build all images, or specify the image to build.
docker compose build omop_esdocker compose build omop-cascadeThe OMOP_ES_VERSION environment variable controls which version of omop_es to use. It accepts:
- Branch name (e.g.,
master,feature/xyz) - Always pulls the latest commit from that branch - Commit SHA (e.g.,
a1b2c3d4or full SHA) - Pins to a specific commit (no automatic updates) - Tag name (e.g.,
v1.0.0) - Pins to a specific release (no automatic updates)
When using a branch name, the container will git pull to get the latest code on each run. When
using a commit SHA or tag, the container will checkout that specific version without pulling
updates.
Set environment variables:
# Copy the templates and fill out as needed
cp template.env .env
cp omop_es/template.env omop_es/.env
cp omop-cascade/template.env omop-cascade/.envdocker compose --project-name <PROJECT-NAME> run --build \
--env SETTINGS_ID=<SETTINGS_ID>
--env ... \
omop_esAdjust the .env files accordingly.
docker compose -f docker-compose.prod.yml --project-name <PROJECT-NAME> run --build \
--env SETTINGS_ID=<SETTINGS_ID> \
--env ... \
omop_esTo be able to clone GitHub repos on a GAE, create a new
fine-grained personal access token, make sure
the "Resource owner" is set to uclh-criu and then select the repositories you want to access.
Submit the request to generate the token and then make sure to copy the token to a safe place as it
will not be shown again!
First store your PAT in a file on the GAE in the path ~/.pat.txt, then configure git to use the
token by running the following command:
git config --global credential.helper 'store --file ~/.pat.txt'This process needs to be repeated for every GAE.
Additionally record the token in the GITHUB_PAT environment variable in the .env file in this
repository's root.
The main Prefect infrastructure is implemented in the prefect/ directory. The
Dockerfile and entrypoint script for the the omop_es pipeline is located in the
docker/ directory.
We use pre-commit to enforce code style and formatting. Install it by
running pip install pre-commit and then run pre-commit install to install the hooks.
Tests for the Prefect workflow can be run using the following command:
make test