Skip to content

fix: prisma migrate deploy failures on pre-existing instances#23655

Merged
8 commits merged intomainfrom
krrishdholakia/fix-prisma-migrate
Mar 14, 2026
Merged

fix: prisma migrate deploy failures on pre-existing instances#23655
8 commits merged intomainfrom
krrishdholakia/fix-prisma-migrate

Conversation

@ghost
Copy link
Copy Markdown

@ghost ghost commented Mar 14, 2026

Summary

Fixes failed migrations when running prisma migrate deploy on pre-existing litellm instances. The migration recovery logic had multiple bugs that caused it to fail silently even after successfully resolving idempotent errors.

Root Causes

  1. Missing return in recovery handler - The P3018 idempotent error handler logged success but never returned True, causing the setup to report failure even when the final recovery attempt succeeded
  2. Unhandled exceptions - _roll_back_migration threw exceptions that escaped the P3018 handler, preventing _resolve_specific_migration from running
  3. Non-idempotent SQL - Migration statements used ADD COLUMN / DROP COLUMN without IF [NOT] EXISTS, failing immediately on pre-existing instances where the schema was already modified

Changes

  • Added return True after successful P3018 idempotent error recovery
  • Wrapped rollback in try/except to allow recovery to continue even if rollback fails
  • Made migration SQL idempotent with IF NOT EXISTS / IF EXISTS clauses

🧹 Bug Fix

Fixes failed migrations due to idempotent schema changes on pre-existing litellm instances.

Problems:
1. P3018 recovery handler never returned True on successful resolution, causing "Database setup failed after multiple retries" even when the final recovery succeeded
2. _roll_back_migration exceptions escaped the P3018 handler, preventing _resolve_specific_migration from running
3. Migration SQL used ADD COLUMN/DROP COLUMN without IF [NOT] EXISTS, failing if schema was already modified

Changes:
- Add return True after successful P3018 idempotent error recovery
- Wrap _roll_back_migration in try/except to allow recovery continuation even if rollback fails
- Make migration.sql idempotent with IF NOT EXISTS / IF EXISTS clauses

Co-Authored-By: Claude Haiku 4.5 <[email protected]>
@vercel
Copy link
Copy Markdown

vercel bot commented Mar 14, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
litellm Ready Ready Preview, Comment Mar 14, 2026 11:32pm

Request Review

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Mar 14, 2026

Greptile Summary

This PR fixes three migration recovery bugs in litellm_proxy_extras/utils.py that caused prisma migrate deploy to fail silently on pre-existing LiteLLM instances, and hardens all 99+ migration SQL files to use idempotent DDL.

Key changes:

  • utils.py — Removes the erroneous return True from the P3009 idempotent-error handler; without it the outer for attempt in range(4) loop now correctly re-runs prisma migrate deploy to pick up any remaining pending migrations. Both _roll_back_migration and _resolve_specific_migration are individually wrapped in try/except (CalledProcessError, TimeoutExpired) so a failed rollback no longer prevents the resolve step from running.
  • Migration SQL files — All DDL statements (CREATE TABLE, ADD/DROP COLUMN, CREATE/DROP INDEX, ADD/DROP CONSTRAINT, RENAME COLUMN) are updated to use IF [NOT] EXISTS guards or DO $$ conditional blocks, making every migration safe to re-run.
  • proxy_cli.py — Adds a --skip_db_migration_check opt-in flag that emits a warning and continues startup rather than exiting with code 1 when migration fails; default fail-fast behaviour is preserved, which is backwards compatible. The warning message in the skip_db_migration_check branch is self-contradictory (see inline comment).
  • TestsTestMigrationSQLIdempotency adds seven test methods that scan every migration file for bare CREATE TABLE, ADD COLUMN, DROP COLUMN, CREATE INDEX, DROP INDEX, RENAME COLUMN, and DROP CONSTRAINT statements, providing a regression gate for all future migrations.

Confidence Score: 4/5

  • Safe to merge after fixing the misleading warning message in proxy_cli.py; core migration recovery logic and SQL idempotency changes are correct.
  • The three root-cause fixes in utils.py are logically sound and address real bugs. All migration SQL files reviewed are correctly updated with IF [NOT] EXISTS guards. The new test suite provides strong regression coverage. The only issue found is a self-contradictory warning message in proxy_cli.py (line 871) that tells the operator to pass a flag they have already passed — this is a minor UX bug but not a correctness or safety issue.
  • litellm/proxy/proxy_cli.py — the warning message inside the skip_db_migration_check branch is contradictory and should be corrected.

Important Files Changed

Filename Overview
litellm-proxy-extras/litellm_proxy_extras/utils.py Fixes three migration recovery bugs: removes erroneous return True after P3009 resolution (allowing the outer retry loop to re-run prisma migrate deploy for remaining migrations), wraps both _roll_back_migration and _resolve_specific_migration in try/except to prevent exceptions from aborting recovery, and adds subprocess.TimeoutExpired to the caught exception types. Logic is sound; the outer for attempt in range(4) loop will automatically retry after each resolution step.
litellm/proxy/proxy_cli.py Adds --skip_db_migration_check flag to allow startup to continue despite migration failures. Default behavior (exit on failure) is preserved, maintaining backwards compatibility. Contains a misleading warning message that tells the user to pass a flag they have already passed.
tests/litellm-proxy-extras/test_litellm_proxy_extras_utils.py Adds TestMigrationSQLIdempotency class with seven test methods enforcing IF [NOT] EXISTS guards on CREATE TABLE, ADD COLUMN, DROP COLUMN, CREATE INDEX, DROP INDEX, RENAME COLUMN, and ADD/DROP CONSTRAINT across all migration files. Tests run against the full migration corpus, providing strong regression coverage for this fix.
litellm-proxy-extras/litellm_proxy_extras/migrations/20260303000000_update_tool_table_policies/migration.sql RENAME COLUMN "call_policy" TO "input_policy" is correctly wrapped in a DO $$ IF EXISTS block. All ADD COLUMN statements use IF NOT EXISTS. Index operations use IF [NOT] EXISTS. Fully idempotent.
litellm-proxy-extras/litellm_proxy_extras/migrations/20250603210143_cascade_budget_changes/migration.sql Both DROP CONSTRAINT and ADD CONSTRAINT are wrapped in DO $$ blocks with IF EXISTS / IF NOT EXISTS guards. Fully idempotent.
litellm-proxy-extras/litellm_proxy_extras/migrations/20250326162113_baseline/migration.sql Baseline migration updated to use CREATE TABLE IF NOT EXISTS throughout, making re-runs safe on pre-existing schemas.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[setup_database called] --> B{for attempt in range 4}
    B --> C[prisma migrate deploy]
    C -->|success| D[post-migration sanity check]
    D --> E[return True ✅]
    C -->|P3009 failed migration| F{idempotent error?}
    F -->|yes| G[try: _roll_back_migration]
    G -->|CalledProcessError / TimeoutExpired| H[log warning, continue]
    G -->|success| I[try: _resolve_specific_migration]
    H --> I
    I -->|CalledProcessError / TimeoutExpired| J[log warning]
    I -->|success| K[log resolved]
    J --> B
    K --> B
    F -->|no| L[mark rolled-back, retry]
    L --> B
    C -->|P3005 non-empty schema| M[create baseline + resolve all]
    M --> E
    C -->|P3018 permission error| N[rollback + raise RuntimeError ❌]
    C -->|P3018 idempotent| O[try: rollback + resolve]
    O --> B
    B -->|4 attempts exhausted| P[return False ❌]
Loading

Comments Outside Diff (1)

  1. litellm/proxy/proxy_cli.py, line 869-872 (link)

    Warning message is self-contradictory

    This code is inside if skip_db_migration_check:, which means the operator has already passed --skip_db_migration_check. The message incorrectly instructs them to "Pass --skip_db_migration_check to allow this" when they've already done so. Any operator reading this will be confused.

Last reviewed commit: 8e1ed91

Comment on lines +524 to +528
except subprocess.CalledProcessError as rollback_err:
logger.warning(
f"Failed to roll back migration {migration_name}: {rollback_err}. "
f"It may already be in a rolled-back state."
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

subprocess.TimeoutExpired not caught in rollback handler

_roll_back_migration calls subprocess.run(..., timeout=60, check=True), which can raise either subprocess.CalledProcessError (non-zero exit) or subprocess.TimeoutExpired (if the command takes >60 s on a slow/loaded database). The new try/except only catches CalledProcessError, so a TimeoutExpired will still escape, skip _resolve_specific_migration, and cause the recovery to fail — the exact bug this PR aims to fix.

The equivalent rollback in the permission-error path (line ~494) correctly uses except Exception to guard against all failure modes.

Suggested change
except subprocess.CalledProcessError as rollback_err:
logger.warning(
f"Failed to roll back migration {migration_name}: {rollback_err}. "
f"It may already be in a rolled-back state."
)
except (subprocess.CalledProcessError, subprocess.TimeoutExpired) as rollback_err:
logger.warning(
f"Failed to roll back migration {migration_name}: {rollback_err}. "
f"It may already be in a rolled-back state."
)

Adds TestMigrationSQLIdempotency test class that statically validates all
migration SQL files created after 2026-03-11 use idempotent DDL:
- ADD COLUMN must use IF NOT EXISTS
- DROP COLUMN must use IF EXISTS
- DROP INDEX must use IF EXISTS
- CREATE INDEX must use IF NOT EXISTS

This prevents the class of errors where prisma migrate deploy fails on
pre-existing instances because the schema was already modified.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c14b574737

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

migration_name
)
logger.info("✅ Migration resolved.")
return True
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Continue migration deploy after resolving one failed step

Returning True immediately after _resolve_specific_migration() marks only the currently failing migration as applied and exits setup_database, so any later pending migrations are never deployed in this startup run. This occurs when prisma migrate deploy hits an idempotent P3018 on migration N but N+1+ migrations are still pending (common on version-skipping upgrades), causing startup to report success while the schema remains partially migrated.

Useful? React with 👍 / 👎.

_roll_back_migration uses subprocess.run with timeout=60, so it can raise
subprocess.TimeoutExpired in addition to CalledProcessError. Without
catching this, a slow database during rollback would escape the handler
and bypass _resolve_specific_migration — the same class of bug.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Comment on lines +175 to +197
# Non-idempotent DDL patterns that should use IF [NOT] EXISTS
_UNSAFE_PATTERNS = [
# ADD COLUMN without IF NOT EXISTS
(
r"ADD\s+COLUMN\s+(?!IF\s+NOT\s+EXISTS)",
"ADD COLUMN without IF NOT EXISTS",
),
# DROP COLUMN without IF EXISTS
(
r"DROP\s+COLUMN\s+(?!IF\s+EXISTS)",
"DROP COLUMN without IF EXISTS",
),
# DROP INDEX without IF EXISTS (standalone statement, not inside ALTER TABLE)
(
r"DROP\s+INDEX\s+(?!IF\s+EXISTS)(?!.*ON)",
"DROP INDEX without IF EXISTS",
),
# CREATE INDEX without IF NOT EXISTS
(
r"CREATE\s+(?:UNIQUE\s+)?INDEX\s+(?!IF\s+NOT\s+EXISTS)(?!CONCURRENTLY)",
"CREATE INDEX without IF NOT EXISTS",
),
]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_UNSAFE_PATTERNS is dead code

The _UNSAFE_PATTERNS list is defined here but never referenced anywhere — each of the four TestMigrationSQLIdempotency test methods re-implements its own inline regex check independently. The variable serves no purpose and can mislead future maintainers into thinking it drives the tests.

Additionally, there's a subtle inconsistency: the DROP INDEX entry uses (?!.*ON) to try to exclude MySQL-style DROP INDEX name ON table syntax, and the CREATE INDEX entry uses (?!CONCURRENTLY) to skip CONCURRENTLY indexes — but neither of these exclusions appears in the corresponding test methods, meaning _UNSAFE_PATTERNS and the actual enforcement logic are already diverged.

Either delete the variable, or replace the four duplicate inline-regex blocks in the test methods with a single parametrized test driven by _UNSAFE_PATTERNS.

Comment on lines +209 to +216
@pytest.fixture(scope="class")
def recent_migrations(self):
migrations = _get_recent_migrations()
assert len(migrations) > 0, (
f"No migrations found at or after cutoff {_IDEMPOTENT_SQL_CUTOFF}. "
f"Check that _MIGRATIONS_DIR ({_MIGRATIONS_DIR}) is correct."
)
return migrations
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard assert in fixture will fail instead of skip when path is wrong

The recent_migrations fixture raises AssertionError (i.e., a test failure, not a skip) when no migration files are found. If _MIGRATIONS_DIR is incorrect in some CI environment — or the migrations directory is cleaned — all four tests will be reported as failures rather than skipped/xfailed, which makes it harder to distinguish a configuration problem from a real regression.

Consider using pytest.skip when the directory is missing or empty:

Suggested change
@pytest.fixture(scope="class")
def recent_migrations(self):
migrations = _get_recent_migrations()
assert len(migrations) > 0, (
f"No migrations found at or after cutoff {_IDEMPOTENT_SQL_CUTOFF}. "
f"Check that _MIGRATIONS_DIR ({_MIGRATIONS_DIR}) is correct."
)
return migrations
@pytest.fixture(scope="class")
def recent_migrations(self):
migrations = _get_recent_migrations()
if len(migrations) == 0:
pytest.skip(
f"No migrations found at or after cutoff {_IDEMPOTENT_SQL_CUTOFF}. "
f"Check that _MIGRATIONS_DIR ({_MIGRATIONS_DIR}) is correct."
)
return migrations

Fixed all existing migration files to use IF [NOT] EXISTS for DDL
statements (ADD COLUMN, DROP COLUMN, DROP INDEX, CREATE INDEX).
Removed the date cutoff from the idempotency tests so they now
validate all migrations, not just recent ones.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@@ -2,19 +2,19 @@
ALTER TABLE "LiteLLM_ToolTable" RENAME COLUMN "call_policy" TO "input_policy";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RENAME COLUMN is not idempotent on pre-existing instances

This PR makes every ADD COLUMN, DROP COLUMN, and index DDL statement idempotent with IF [NOT] EXISTS, but the bare RENAME COLUMN "call_policy" TO "input_policy" on line 2 is left unguarded. On any pre-existing instance where call_policy was already renamed (e.g., the migration ran once and was then re-attempted), PostgreSQL raises ERROR: column "call_policy" does not exist, which would trip the same failure mode this PR is fixing.

The migration 20250806095134_rename_alias_to_server_name_mcp_table already shows the correct pattern:

DO $$
BEGIN
    IF EXISTS (SELECT 1 FROM information_schema.columns WHERE table_name = 'LiteLLM_ToolTable' AND column_name = 'call_policy') THEN
        ALTER TABLE "LiteLLM_ToolTable" RENAME COLUMN "call_policy" TO "input_policy";
    END IF;
END $$;

This guards the rename with an existence check so re-running it is safe.

Comment on lines +175 to +241
def all_migrations(self):
migrations = _get_all_migrations()
assert len(migrations) > 0, (
f"No migrations found. "
f"Check that _MIGRATIONS_DIR ({_MIGRATIONS_DIR}) is correct."
)
return migrations

def test_add_column_uses_if_not_exists(self, all_migrations):
"""ADD COLUMN statements must use IF NOT EXISTS"""
violations = []
for migration_name, sql in all_migrations:
for line_num, line in enumerate(sql.splitlines(), 1):
if re.search(r"ADD\s+COLUMN\s+", line, re.IGNORECASE) and not re.search(
r"ADD\s+COLUMN\s+IF\s+NOT\s+EXISTS", line, re.IGNORECASE
):
violations.append(f" {migration_name}:{line_num}: {line.strip()}")
assert not violations, (
"ADD COLUMN without IF NOT EXISTS found in recent migrations:\n"
+ "\n".join(violations)
)

def test_drop_column_uses_if_exists(self, all_migrations):
"""DROP COLUMN statements must use IF EXISTS"""
violations = []
for migration_name, sql in all_migrations:
for line_num, line in enumerate(sql.splitlines(), 1):
if re.search(r"DROP\s+COLUMN\s+", line, re.IGNORECASE) and not re.search(
r"DROP\s+COLUMN\s+IF\s+EXISTS", line, re.IGNORECASE
):
violations.append(f" {migration_name}:{line_num}: {line.strip()}")
assert not violations, (
"DROP COLUMN without IF EXISTS found in recent migrations:\n"
+ "\n".join(violations)
)

def test_drop_index_uses_if_exists(self, all_migrations):
"""DROP INDEX statements must use IF EXISTS"""
violations = []
for migration_name, sql in all_migrations:
for line_num, line in enumerate(sql.splitlines(), 1):
if re.search(r"DROP\s+INDEX\s+", line, re.IGNORECASE) and not re.search(
r"DROP\s+INDEX\s+IF\s+EXISTS", line, re.IGNORECASE
):
violations.append(f" {migration_name}:{line_num}: {line.strip()}")
assert not violations, (
"DROP INDEX without IF EXISTS found in recent migrations:\n"
+ "\n".join(violations)
)

def test_create_index_uses_if_not_exists(self, all_migrations):
"""CREATE INDEX statements must use IF NOT EXISTS"""
violations = []
for migration_name, sql in all_migrations:
for line_num, line in enumerate(sql.splitlines(), 1):
if re.search(
r"CREATE\s+(?:UNIQUE\s+)?INDEX\s+", line, re.IGNORECASE
) and not re.search(
r"CREATE\s+(?:UNIQUE\s+)?INDEX\s+(?:CONCURRENTLY\s+)?IF\s+NOT\s+EXISTS",
line,
re.IGNORECASE,
):
violations.append(f" {migration_name}:{line_num}: {line.strip()}")
assert not violations, (
"CREATE INDEX without IF NOT EXISTS found in recent migrations:\n"
+ "\n".join(violations)
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestMigrationSQLIdempotency has no coverage for RENAME COLUMN

The four test methods enforce idempotency for ADD COLUMN, DROP COLUMN, DROP INDEX, and CREATE INDEX, but there is no test for bare RENAME COLUMN statements — which are also non-idempotent. The migration 20260303000000_update_tool_table_policies contains an unguarded RENAME COLUMN that would fail on a pre-existing instance.

Adding a fifth method would catch regressions of this class:

def test_rename_column_uses_do_block(self, all_migrations):
    """RENAME COLUMN must be wrapped in a conditional DO $$ block"""
    violations = []
    for migration_name, sql in all_migrations:
        for line_num, line in enumerate(sql.splitlines(), 1):
            if re.search(r"RENAME\s+COLUMN\s+", line, re.IGNORECASE):
                # Bare RENAME COLUMN outside a DO block is not idempotent
                # Check that it appears inside a DO $$ ... END $$ block
                if not re.search(r"^\s*(IF\s+EXISTS|DO\s+\$)", line, re.IGNORECASE):
                    violations.append(f"  {migration_name}:{line_num}: {line.strip()}")
    assert not violations, (
        "RENAME COLUMN outside a conditional DO $$ block found in migrations:\n"
        + "\n".join(violations)
    )

…gration flag

By default the proxy now warns and continues when database migration
fails. Pass --require_db_migration (or set REQUIRE_DB_MIGRATION=true)
to restore the previous behavior of exiting with an error.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
Comment on lines 865 to +878
if not PrismaManager.setup_database(
use_migrate=not use_prisma_db_push
):
print( # noqa
"\033[1;31mLiteLLM Proxy: Database setup failed after multiple retries. "
"The proxy cannot start safely. Please check your database connection and migration status.\033[0m"
)
sys.exit(1)
if require_db_migration:
print( # noqa
"\033[1;31mLiteLLM Proxy: Database setup failed after multiple retries. "
"The proxy cannot start safely. Please check your database connection and migration status.\033[0m"
)
sys.exit(1)
else:
print( # noqa
"\033[1;33mLiteLLM Proxy: Database migration failed but continuing startup. "
"Use --require_db_migration to exit on migration failure.\033[0m"
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Default behavior changed from fail-safe to fail-open

The proxy previously exited with code 1 whenever setup_database() returned False. This PR flips the default: migration failures are now silently swallowed and startup continues unless the operator explicitly passes --require_db_migration.

This is a backwards-incompatible behavior change that could leave production deployments in a broken state:

  • The database schema may be incomplete (missing columns, tables, or indexes that the application code expects)
  • The proxy starts up and serves traffic, but runtime errors then surface for requests that hit the affected schema objects
  • Automated deployments that relied on the old fail-fast behavior to catch migration problems now silently proceed

Operators who aren't aware of this flag change won't know to add --require_db_migration to their startup command. Consider keeping the old fail-fast behavior as the default and instead adding a --ignore_db_migration_failure flag for operators who explicitly want the lenient behavior.

Rule Used: What: avoid backwards-incompatible changes without... (source)

…UMN and ADD CONSTRAINT

Three fixes:

1. _resolve_specific_migration in the P3018 handler was not wrapped in
   try/except, so failures there would bypass the return True and
   propagate unexpectedly — partially defeating the rollback fix.

2. Bare RENAME COLUMN in 20260303000000_update_tool_table_policies was
   non-idempotent. Wrapped in DO $$ IF EXISTS block. Also wrapped all
   28 bare ADD CONSTRAINT statements across 9 migration files in
   DO $$ IF NOT EXISTS (pg_constraint) blocks.

3. Added test_rename_column_is_guarded and test_add_constraint_is_guarded
   to TestMigrationSQLIdempotency for full DDL coverage.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@ghost
Copy link
Copy Markdown
Author

ghost commented Mar 14, 2026

@greptile please re-review

migration_name
)
logger.info("✅ Migration resolved.")
return True
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

return True skips remaining pending migrations

After resolving a single idempotent migration, the code returns True immediately. But _resolve_specific_migration only marks that one migration as applied — it does not apply any other pending migrations. If there are N migrations pending, resolving one and returning True causes the proxy to start up thinking the schema is fully up-to-date, when migrations 2…N have never actually run.

continue would be the correct replacement: it lets the for attempt in range(3) loop execute another prisma migrate deploy, which will now skip the resolved migration and proceed with the remaining ones:

Suggested change
return True
return True

should instead be:

                                        logger.info("✅ Migration resolved, retrying migrate deploy for remaining migrations.")
                                        # break inner error-handling and retry the outer migrate deploy loop
                                        break  # or continue the outer loop

More concretely, replacing return True with continue (if the outer for attempt in range(3) loop is accessible) or restructuring so the outer loop retries would ensure all pending migrations are applied before the function signals success.

@@ -2,5 +2,10 @@
ALTER TABLE "LiteLLM_TeamMembership" DROP CONSTRAINT "LiteLLM_TeamMembership_budget_id_fkey";
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-idempotent DROP CONSTRAINT left unguarded

This PR wraps ADD CONSTRAINT in DO $$ IF NOT EXISTS blocks for idempotency, but the DROP CONSTRAINT directly above (line 2) was not updated. If this migration partially executes — e.g., the DROP CONSTRAINT succeeds but something fails before the migration is marked applied — then any retry attempt will fail with ERROR: constraint "LiteLLM_TeamMembership_budget_id_fkey" of relation "LiteLLM_TeamMembership" does not exist.

The same issue exists in 20250625145206_cascade_budget_and_loosen_managed_file_json/migration.sql:2.

The fix is to wrap the drop in a DO $$ block with an existence check, mirroring the pattern used for the re-add:

-- DropForeignKey
DO $$
BEGIN
    IF EXISTS (SELECT 1 FROM pg_constraint WHERE conname = 'LiteLLM_TeamMembership_budget_id_fkey') THEN
        ALTER TABLE "LiteLLM_TeamMembership" DROP CONSTRAINT "LiteLLM_TeamMembership_budget_id_fkey";
    END IF;
END $$;

This matches the same pg_constraint lookup used for the idempotent ADD CONSTRAINT block below it.

Comment on lines +175 to +275
def all_migrations(self):
migrations = _get_all_migrations()
assert len(migrations) > 0, (
f"No migrations found. "
f"Check that _MIGRATIONS_DIR ({_MIGRATIONS_DIR}) is correct."
)
return migrations

def test_add_column_uses_if_not_exists(self, all_migrations):
"""ADD COLUMN statements must use IF NOT EXISTS"""
violations = []
for migration_name, sql in all_migrations:
for line_num, line in enumerate(sql.splitlines(), 1):
if re.search(r"ADD\s+COLUMN\s+", line, re.IGNORECASE) and not re.search(
r"ADD\s+COLUMN\s+IF\s+NOT\s+EXISTS", line, re.IGNORECASE
):
violations.append(f" {migration_name}:{line_num}: {line.strip()}")
assert not violations, (
"ADD COLUMN without IF NOT EXISTS found in recent migrations:\n"
+ "\n".join(violations)
)

def test_drop_column_uses_if_exists(self, all_migrations):
"""DROP COLUMN statements must use IF EXISTS"""
violations = []
for migration_name, sql in all_migrations:
for line_num, line in enumerate(sql.splitlines(), 1):
if re.search(r"DROP\s+COLUMN\s+", line, re.IGNORECASE) and not re.search(
r"DROP\s+COLUMN\s+IF\s+EXISTS", line, re.IGNORECASE
):
violations.append(f" {migration_name}:{line_num}: {line.strip()}")
assert not violations, (
"DROP COLUMN without IF EXISTS found in recent migrations:\n"
+ "\n".join(violations)
)

def test_drop_index_uses_if_exists(self, all_migrations):
"""DROP INDEX statements must use IF EXISTS"""
violations = []
for migration_name, sql in all_migrations:
for line_num, line in enumerate(sql.splitlines(), 1):
if re.search(r"DROP\s+INDEX\s+", line, re.IGNORECASE) and not re.search(
r"DROP\s+INDEX\s+IF\s+EXISTS", line, re.IGNORECASE
):
violations.append(f" {migration_name}:{line_num}: {line.strip()}")
assert not violations, (
"DROP INDEX without IF EXISTS found in recent migrations:\n"
+ "\n".join(violations)
)

def test_create_index_uses_if_not_exists(self, all_migrations):
"""CREATE INDEX statements must use IF NOT EXISTS"""
violations = []
for migration_name, sql in all_migrations:
for line_num, line in enumerate(sql.splitlines(), 1):
if re.search(
r"CREATE\s+(?:UNIQUE\s+)?INDEX\s+", line, re.IGNORECASE
) and not re.search(
r"CREATE\s+(?:UNIQUE\s+)?INDEX\s+(?:CONCURRENTLY\s+)?IF\s+NOT\s+EXISTS",
line,
re.IGNORECASE,
):
violations.append(f" {migration_name}:{line_num}: {line.strip()}")
assert not violations, (
"CREATE INDEX without IF NOT EXISTS found in recent migrations:\n"
+ "\n".join(violations)
)

def test_rename_column_is_guarded(self, all_migrations):
"""RENAME COLUMN must be inside a DO $$ IF EXISTS block"""
violations = []
for migration_name, sql in all_migrations:
lines = sql.splitlines()
in_do_block = False
for line_num, line in enumerate(lines, 1):
if re.search(r"DO\s+\$\$", line, re.IGNORECASE):
in_do_block = True
if re.search(r"END\s+\$\$", line, re.IGNORECASE):
in_do_block = False
if re.search(r"RENAME\s+COLUMN\s+", line, re.IGNORECASE) and not in_do_block:
violations.append(f" {migration_name}:{line_num}: {line.strip()}")
assert not violations, (
"RENAME COLUMN without DO $$ IF EXISTS guard found in migrations:\n"
+ "\n".join(violations)
)

def test_add_constraint_is_guarded(self, all_migrations):
"""ADD CONSTRAINT must be inside a DO $$ IF NOT EXISTS block"""
violations = []
for migration_name, sql in all_migrations:
lines = sql.splitlines()
in_do_block = False
for line_num, line in enumerate(lines, 1):
if re.search(r"DO\s+\$\$", line, re.IGNORECASE):
in_do_block = True
if re.search(r"END\s+\$\$", line, re.IGNORECASE):
in_do_block = False
if re.search(r"ADD\s+CONSTRAINT\s+", line, re.IGNORECASE) and not in_do_block:
violations.append(f" {migration_name}:{line_num}: {line.strip()}")
assert not violations, (
"ADD CONSTRAINT without DO $$ IF NOT EXISTS guard found in migrations:\n"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No test coverage for DROP CONSTRAINT idempotency

TestMigrationSQLIdempotency now covers ADD COLUMN, DROP COLUMN, DROP INDEX, CREATE INDEX, RENAME COLUMN, and ADD CONSTRAINT, but there is no test for bare DROP CONSTRAINT without an IF EXISTS guard or a DO $$ block wrapper. Two migration files changed in this PR — 20250603210143_cascade_budget_changes and 20250625145206_cascade_budget_and_loosen_managed_file_json — contain unguarded ALTER TABLE ... DROP CONSTRAINT ... statements that a new test method would immediately catch:

def test_drop_constraint_is_guarded(self, all_migrations):
    """DROP CONSTRAINT must use IF EXISTS or be wrapped in a DO $$ block"""
    violations = []
    for migration_name, sql in all_migrations:
        lines = sql.splitlines()
        in_do_block = False
        for line_num, line in enumerate(lines, 1):
            if re.search(r"DO\s+\$\$", line, re.IGNORECASE):
                in_do_block = True
            if re.search(r"END\s+\$\$", line, re.IGNORECASE):
                in_do_block = False
            if (
                re.search(r"DROP\s+CONSTRAINT\s+", line, re.IGNORECASE)
                and not re.search(r"DROP\s+CONSTRAINT\s+IF\s+EXISTS", line, re.IGNORECASE)
                and not in_do_block
            ):
                violations.append(f"  {migration_name}:{line_num}: {line.strip()}")
    assert not violations, (
        "DROP CONSTRAINT without IF EXISTS or DO $$ guard found in migrations:\n"
        + "\n".join(violations)
    )

Three fixes:

1. Both P3009 and P3018 idempotent handlers returned True after
   resolving a single migration, exiting before remaining pending
   migrations were applied. Now they continue the retry loop so
   prisma migrate deploy runs again for any remaining migrations.

2. Two migration files had bare DROP CONSTRAINT without a DO $$ IF
   EXISTS guard, which fails if the constraint was already dropped.
   Wrapped both in idempotent DO $$ blocks.

3. Added test_drop_constraint_is_guarded to catch unguarded DROP
   CONSTRAINT in future migrations.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
…default

Four fixes:

1. P3009 idempotent handler now has the same try/except around
   _roll_back_migration and _resolve_specific_migration as the P3018
   handler. Previously a rollback or resolve failure in the P3009 path
   would propagate and leave the migration unresolved.

2. Added IF NOT EXISTS to all 57 bare CREATE TABLE statements across
   34 migration files. Added test_create_table_uses_if_not_exists to
   catch this pattern.

3. Reverted the backwards-incompatible default behavior change: the
   proxy now fails fast on migration failure (original behavior).
   Added --skip_db_migration_check / SKIP_DB_MIGRATION_CHECK to
   opt into warn-and-continue instead.

Co-Authored-By: Claude Opus 4.6 <[email protected]>
@ghost ghost merged commit b96f033 into main Mar 14, 2026
35 of 37 checks passed
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants