Steps to reproduce
- Have a dstack server multi-replica deployment
- Try upgrading the server version skipping many releases (e.g. from 0.19.20 to 0.19.34)
This may lead to Postgres deadlocks if there is enough background processing going on and you're unlucky.
Actual behaviour
The reason is that all alembic migrations are currently applied in one transaction on deploy:
|
with context.begin_transaction(): |
And while individual migrations try to respect lock ordering on tables, the order is not respected across many migrations.
Expected behaviour
The solution is to run each migration in a separate transaction. There's a drawback that a subset of migrations can apply successfully. That's should be mitigated with proper rollback / restore processes.
dstack version
master
Server logs
Additional information
No response