feat(agents): control plane impl for llm providers v2#2509
feat(agents): control plane impl for llm providers v2#2509jordan-umusu merged 10 commits intomainfrom
Conversation
b9318a8 to
b185e7c
Compare
6ae521f to
4ecff10
Compare
29ee760 to
665fd5b
Compare
4ecff10 to
5dad6bc
Compare
7981aa4 to
5bb4db8
Compare
f7e5329 to
1067cbe
Compare
1067cbe to
1416aab
Compare
|
@codex review |
|
@cubic review |
@jordan-umusu I have started the AI code review. It will take a few minutes to complete. |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 1416aabc57
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
There was a problem hiding this comment.
9 issues found across 20 files
Confidence score: 2/5
- High-risk user impact is likely:
tracecat/agent/access/service.pycan raiseMissingGreenletafter commit without a refresh, andtracecat/agent/provider/service.pycan silently drop existing secret fields during partial updates (data loss/regression risk). - Pagination behavior has concrete correctness issues in
tracecat/agent/catalog/service.py(tie-breaker omission can skip rows) andtracecat/agent/catalog/router.py(always returningnext_cursor=Nonecan make items beyondlimitunreachable). - Several router/service error-mapping and integrity gaps (
tracecat/agent/access/router.py,tracecat/agent/access/service.py,tracecat/agent/provider/router.py) are likely to surface as 500s or raw DB errors instead of controlled 4xx/domain responses. - Pay close attention to
tracecat/agent/access/service.py,tracecat/agent/provider/service.py,tracecat/agent/catalog/service.py,tracecat/agent/catalog/router.py- these contain the highest-confidence breakage paths (runtime async ORM failure, config overwrite, and pagination correctness).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="tracecat/agent/provider/schemas.py">
<violation number="1" location="tracecat/agent/provider/schemas.py:50">
P2: Add max-length validation to update fields to match create/DB constraints and prevent oversized PATCH payloads from failing at DB commit.</violation>
</file>
<file name="tracecat/agent/service.py">
<violation number="1" location="tracecat/agent/service.py:511">
P2: Default model selection is written with two independent commits, which can leave settings inconsistent if the second write fails.</violation>
</file>
<file name="tracecat/agent/provider/router.py">
<violation number="1" location="tracecat/agent/provider/router.py:52">
P2: Handle invalid pagination cursors in the router and return HTTP 400 instead of letting `ValueError` propagate.
(Based on your team's feedback about translating known validation/domain errors at router boundaries.) [FEEDBACK_USED]</violation>
</file>
<file name="tracecat/agent/catalog/service.py">
<violation number="1" location="tracecat/agent/catalog/service.py:156">
P1: Cursor pagination can skip rows with identical `created_at` values because the continuation filter ignores the `id` tie-breaker.</violation>
</file>
<file name="tracecat/agent/catalog/router.py">
<violation number="1" location="tracecat/agent/catalog/router.py:214">
P1: This endpoint truncates results but always returns `next_cursor=None`, which can make workspace models beyond `limit` permanently unreachable to clients.</violation>
</file>
<file name="tracecat/agent/access/router.py">
<violation number="1" location="tracecat/agent/access/router.py:83">
P1: Handle `TracecatNotFoundError` in `disable_model` and map it to HTTP 404; otherwise missing IDs return 500 via the global exception handler.
(Based on your team's feedback about translating known domain errors at router/global-handler boundaries.) [FEEDBACK_USED]</violation>
</file>
<file name="tracecat/agent/access/service.py">
<violation number="1" location="tracecat/agent/access/service.py:44">
P1: Missing `IntegrityError` handling for the unique constraint on `(organization_id, workspace_id, catalog_id)`. Enabling the same model twice will surface an unhandled `IntegrityError` instead of a meaningful error. Wrap the commit and translate to a domain error so routers can return HTTP 409.
(Based on your team's feedback about returning HTTP 409 for duplicate assignment operations.) [FEEDBACK_USED]</violation>
<violation number="2" location="tracecat/agent/access/service.py:45">
P1: Missing `await self.session.refresh(access)` after commit. In async SQLAlchemy, the commit expires all loaded objects; accessing attributes on `access` in `model_validate` will raise `MissingGreenlet` because implicit lazy loading is not available in async sessions. Every other agent service refreshes the object before returning it.</violation>
</file>
<file name="tracecat/agent/provider/service.py">
<violation number="1" location="tracecat/agent/provider/service.py:52">
P1: Partial secret update overwrites the entire encrypted config, losing previously stored fields. If a user updates only `api_key`, existing `custom_headers` are silently dropped (and vice versa). Decrypt the existing config first, merge the update, then re-encrypt.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
1416aab to
367c8d3
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 367c8d3da5
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
367c8d3 to
81b3774
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 81b3774375
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
3990191 to
055253b
Compare
f5a6d8e to
de58209
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: de58209dbf
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| return self._to_default_model_selection(catalog_entry) | ||
| return None |
There was a problem hiding this comment.
Fall back to legacy default when catalog id is stale
This branch returns None as soon as the stored catalog-id setting does not resolve, so the method never checks the legacy agent_default_model value afterward. Since the legacy set_default_model path still updates only the name setting, mixed usage can leave agent_default_model_catalog_id stale and make /agent/default-model-selection disagree with the effective default model (or become null unexpectedly). When catalog-id lookup fails, continue to the legacy-name fallback (or keep both settings synchronized).
Useful? React with 👍 / 👎.
055253b to
79607d1
Compare
de58209 to
cb79da3
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: cb79da3c4c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
cb79da3 to
86cdaf2
Compare
86cdaf2 to
843d236
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 843d2367c8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| ) -> DefaultModelSelection: | ||
| """Set the canonical default model selection for the organization.""" | ||
| access_svc = AgentModelAccessService(session=self.session, role=self.role) | ||
| enabled_models = await access_svc.get_org_models() |
There was a problem hiding this comment.
Avoid hidden agent:read requirement in default-model update
set_default_model_selection is gated with agent:update, but it calls AgentModelAccessService.get_org_models(), which is decorated with @require_scope("agent:read"). A role that legitimately has update permissions but not read permissions will get a scope-denied error when setting the default model, even though the route advertises update-only access. This makes the endpoint behavior inconsistent with its declared authorization contract.
Useful? React with 👍 / 👎.
843d236 to
43b3a6b
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 43b3a6b6dc
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| stmt = insert(AgentCatalog).values(values) | ||
| stmt = stmt.on_conflict_do_update( | ||
| index_elements=[ | ||
| "organization_id", | ||
| "custom_provider_id", | ||
| "model_provider", | ||
| "model_name", | ||
| ], | ||
| set_={ | ||
| "model_metadata": stmt.excluded.model_metadata, | ||
| "last_refreshed_at": stmt.excluded.last_refreshed_at, | ||
| }, | ||
| ) | ||
| await self.session.execute(stmt) |
There was a problem hiding this comment.
Prune stale discovered models during provider refresh
upsert_discovered_models() only performs INSERT ... ON CONFLICT DO UPDATE for models returned by the latest discovery run, but it never deletes previously discovered rows that are now absent from the provider response. When a provider removes a model, that stale catalog row remains enabled/selectable and can still be chosen as default, leading to downstream runtime failures against a model that no longer exists. Refresh should reconcile the full set for (organization_id, custom_provider_id, model_provider) by removing rows not present in the new discovery payload.
Useful? React with 👍 / 👎.
Summary by cubic
Implements the v2 LLM providers control plane with a shared model catalog, encrypted custom provider management, org/workspace access control, and a canonical default‑model selection. Adds audited, scope‑checked REST APIs and updates the
frontendclient for these endpoints.New Features
/organization/agent-catalog): list/get with provider/name filters and cursor pagination; create/update/delete; platform bulk upserts and discovery upserts; effective workspace models via GET/workspaces/{workspace_id}/agent-models; audit events; platform catalog bumped with updated capability flags./organization/agent-custom-providers): CRUD with encrypted API keys and custom headers; strictbase_urlvalidation (HTTP/HTTPS with hostname); passthrough forwarding fix; connection validation; refresh to sync models into the catalog; discovery status tracking (never/running/succeeded/failed); resolves catalog config with decrypted creds/passthrough; audit events./organization/agent-model-access): enable/disable models at org or workspace; list enabled models with cursor; audit events./agent/default-model-selection): GET/PUT the org’s canonical default bycatalog_id; storescatalog_id,model_provider,model_name, and optionalcustom_provider_id; resolves legacy name‑only defaults;frontendadds types and servicesagentGetDefaultModelSelectionandagentSetDefaultModelSelection.Migration
organization_idis NULL) are readable across orgs but not writable; org‑owned rows remain protected.TRACECAT__DB_ENCRYPTION_KEYto enable encryption for provider secrets.Written for commit 43b3a6b. Summary will update on new commits. Review in cubic