Skip to content

handle stale deployment records on redeploy (409 Conflict)#213

Merged
peterj merged 2 commits intomainfrom
peterj/fix409
Feb 27, 2026
Merged

handle stale deployment records on redeploy (409 Conflict)#213
peterj merged 2 commits intomainfrom
peterj/fix409

Conversation

@peterj
Copy link
Copy Markdown
Contributor

@peterj peterj commented Feb 25, 2026

Description

  • Fix 409 Conflict error when redeploying an agent or MCP server whose
    runtime resources were removed externally but whose database record
    was not cleaned up
  • Add cleanupExistingDeployment helper that removes stale DB records
    and attempts Kubernetes resource cleanup (non-fatal) before retrying
    the deployment insert
  • Remove stray fmt.Println debug statement from DeployServer

Change Type

/kind fix

Changelog

NONE

Additional Notes

Root cause

The deployments table uses PRIMARY KEY (server_name, version). When
runtime resources (e.g. Kubernetes pods) are deleted externally—via
kubectl, namespace cleanup, or failed reconciliation—the corresponding
database record is not removed. Subsequent deploy attempts hit the
unique constraint, returning a 409 Conflict even though no actual
instance exists.

The error path is:
CreateDeployment INSERT → PG error 23505 → ErrAlreadyExistshuma.Error409Conflict

ReconcileAll cannot fix this because it runs after the INSERT
succeeds—it never gets a chance to execute when the INSERT itself fails.

Fix

In DeployServer and DeployAgent, when CreateDeployment returns
ErrAlreadyExists:

  1. Look up the existing deployment record
  2. Attempt Kubernetes resource cleanup (non-fatal, since resources may
    already be gone)
  3. Remove the stale DB record
  4. Retry CreateDeployment
  5. Proceed with ReconcileAll to create fresh runtime resources

Signed-off-by: Peter Jausovec <[email protected]>
@peterj peterj merged commit 0ac477b into main Feb 27, 2026
5 checks passed
christian-posta pushed a commit to christian-posta/agentregistry that referenced this pull request Mar 9, 2026
…stry-dev#213)

# Description

- Fix 409 Conflict error when redeploying an agent or MCP server whose
  runtime resources were removed externally but whose database record
  was not cleaned up
- Add `cleanupExistingDeployment` helper that removes stale DB records
  and attempts Kubernetes resource cleanup (non-fatal) before retrying
  the deployment insert
- Remove stray `fmt.Println` debug statement from `DeployServer`

# Change Type

```
/kind fix
```

# Changelog

```release-note
NONE
```

# Additional Notes

## Root cause
The `deployments` table uses `PRIMARY KEY (server_name, version)`. When
runtime resources (e.g. Kubernetes pods) are deleted externally—via
`kubectl`, namespace cleanup, or failed reconciliation—the corresponding
database record is not removed. Subsequent deploy attempts hit the
unique constraint, returning a 409 Conflict even though no actual
instance exists.

The error path is:
`CreateDeployment` INSERT → PG error 23505 → `ErrAlreadyExists` →
`huma.Error409Conflict`

`ReconcileAll` cannot fix this because it runs *after* the INSERT
succeeds—it never gets a chance to execute when the INSERT itself fails.

## Fix
In `DeployServer` and `DeployAgent`, when `CreateDeployment` returns
`ErrAlreadyExists`:
1. Look up the existing deployment record
2. Attempt Kubernetes resource cleanup (non-fatal, since resources may
   already be gone)
3. Remove the stale DB record
4. Retry `CreateDeployment`
5. Proceed with `ReconcileAll` to create fresh runtime resources

---------

Signed-off-by: Peter Jausovec <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants