Skip to content

[Bug]: Upgrading dstack server may lead to Postgres deadlocks in multi-replica deployments #3219

@r4victor

Description

@r4victor

Steps to reproduce

  1. Have a dstack server multi-replica deployment
  2. Try upgrading the server version skipping many releases (e.g. from 0.19.20 to 0.19.34)

This may lead to Postgres deadlocks if there is enough background processing going on and you're unlucky.

Actual behaviour

The reason is that all alembic migrations are currently applied in one transaction on deploy:

with context.begin_transaction():

And while individual migrations try to respect lock ordering on tables, the order is not respected across many migrations.

Expected behaviour

The solution is to run each migration in a separate transaction. There's a drawback that a subset of migrations can apply successfully. That's should be mitigated with proper rollback / restore processes.

dstack version

master

Server logs

Additional information

No response

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions