Skip to content

fix(rocky-ai): credential redaction + testgen path canonicalization + token budget#287

Merged
hugocorreia90 merged 1 commit intomainfrom
fix/rocky-ai-hardening
Apr 29, 2026
Merged

fix(rocky-ai): credential redaction + testgen path canonicalization + token budget#287
hugocorreia90 merged 1 commit intomainfrom
fix/rocky-ai-hardening

Conversation

@hugocorreia90
Copy link
Copy Markdown
Contributor

Three issues from the rocky-ai maintenance sweep, landed together since they all touch the same crate and the third one cascades into rocky-core and the schema codegen.

Issue 6a — credential redaction (HIGH/SEC)

AiConfig stored api_key: String while deriving Debug + Serialize. Any future tracing::error!(?config) (or tracing::debug!(?self.config) inside LlmClient) would write the raw API key straight into log files.

Fix: wrap api_key in rocky_core::redacted::RedactedString, which masks Debug and Display to *** by design. The raw value is exposed only at the outbound HTTP boundary in call_anthropic via RedactedString::expose(). Added a debug_does_not_leak_api_key regression test that asserts format!("{config:?}") never contains the secret and always contains ***.

Issue 6b — testgen path traversal (HIGH/SEC)

testgen::save_tests built file paths by joining tests_dir with format!("{model}_{name}.sql", name = assertion.name.replace(' ', "_").to_lowercase()). The LLM controls assertion.name end-to-end, and .. / / / \ slipped straight through replace(' ', "_") — direct path traversal from model output to the filesystem.

Fix: introduced a strict allow-list (is_safe_path_component) — ASCII alphanumeric, _, -, non-empty, no leading dot — and validated both the model name and every assertion name before any Path::join. Unsafe input rejects with io::ErrorKind::InvalidInput. Added the requested assertion.name = "../../../etc/passwd" regression test plus four siblings covering absolute paths, dotted names, unsafe model names, and the happy path.

Issue 6c — max_tokens hardcoded + unbounded retry budget (MED/PERF + cost)

call_anthropic hard-coded max_tokens: 4096 and generate_model retried 3× with no aggregate token / cost / time ceiling. A pathological run could pay for 3 × 4096 output tokens before fail-stopping.

Fix:

  • New [ai] max_tokens = N config block in rocky.toml (default 4096, preserving prior behaviour) — backed by a new AiSection struct wired into RockyConfig.
  • AiConfig::max_tokens is sourced from that config block in make_client and used both as the per-request cap on the Anthropic Messages API and as the cumulative output-token budget across retries.
  • The compile-verify retry loop in generate_model now sums response.output_tokens across attempts; when the running total exceeds the budget, it fail-stops with a new AiError::TokenBudgetExceeded variant instead of issuing another retry.
  • Documented the knob in docs/src/content/docs/reference/configuration.md (new ## [ai] section) and the docs/src/content/docs/guides/ai-features.md setup guide.

The Anthropic API caps each individual response at the per-request max_tokens, so attempt 1 alone can never trip the budget. The fail-stop kicks in only on attempt ≥ 2 with runaway accumulation — preserving typical-case behaviour while bounding the worst case.

Codegen cascade

Adding AiSection to RockyConfig updates the autogenerated rocky_project.schema.json, the VS Code project schema and TypeScript types, and the Dagster Pydantic model. All regenerated via just codegen and committed in this PR so codegen-drift.yml stays green.

Test plan

  • cd engine && cargo test -p rocky-ai — 26 passed (includes new Debug-leak + path-traversal regression tests)
  • cd engine && cargo test -p rocky-core — 1122 passed (covers new AiSection field on RockyConfig)
  • cd engine && cargo test -p rocky-cli — 296 passed
  • cd engine && cargo clippy -p rocky-ai --all-targets -- -D warnings — clean
  • cd engine && cargo clippy --all-targets -- -D warnings — clean
  • cd engine && cargo fmt --check — clean
  • just codegen — regenerated rocky_project.schema.json, editors/vscode/schemas/rocky-project.schema.json, editors/vscode/src/types/generated/rocky_project.ts, integrations/dagster/src/dagster_rocky/types_generated/rocky_project_schema.py
  • cd integrations/dagster && uv run pytest — 458 passed

Three issues from the rocky-ai maintenance sweep.

6a (HIGH/SEC): AiConfig stored api_key as a raw String while deriving
Debug + Serialize. Any future tracing call that included `?config` would
exfiltrate the raw key into log files. Wrap api_key in
rocky_core::redacted::RedactedString so `{:?}` prints `***`; expose the
raw value only at the outbound HTTP header. Add a regression test that
asserts Debug output never contains the raw secret.

6b (HIGH/SEC): testgen::save_tests built file paths from LLM-controlled
assertion.name without sanitizing `..`, `/`, `\\`, leading dots, or empty
strings — direct path traversal from model output to the local
filesystem. Validate both the model name and each assertion name against
a strict allow-list (ASCII alphanumeric + `_` + `-`, non-empty,
non-dotted) before constructing the output path. Reject with
io::ErrorKind::InvalidInput on any unsafe input. Add the requested
`assertion.name = "../../../etc/passwd"` regression test plus four
sibling tests for absolute paths, dotted names, unsafe model names, and
the happy path.

6c (MED/PERF + cost): max_tokens was hard-coded to 4096 and the
compile-verify retry loop had no global cost ceiling — three retries
could each return up to max_tokens, billing N x max_tokens before
failing. Introduce a new `[ai] max_tokens = N` knob in rocky.toml
(default 4096, preserving existing behaviour). Use it both as the
per-request cap on the Anthropic Messages API and as a cumulative
output-token budget tracked across retries; when the running total
exceeds the budget, fail-stop with a new TokenBudgetExceeded error
instead of issuing another retry. Document the knob in the AI section
of the configuration reference and the AI features guide.

Test plan:
- cargo test -p rocky-ai (26 passed, including the new path-traversal
  and Debug-leak regression tests)
- cargo test -p rocky-core (1122 + sibling test suites passed; covers
  the new AiSection field on RockyConfig)
- cargo test -p rocky-cli (296 passed)
- cargo clippy --all-targets -- -D warnings (clean)
- cargo fmt --check (clean)
- just codegen (regenerated rocky_project schema, vscode TS types,
  dagster Pydantic model)
- cd integrations/dagster && uv run pytest (458 passed)
@hugocorreia90 hugocorreia90 merged commit ba4aeda into main Apr 29, 2026
15 checks passed
@hugocorreia90 hugocorreia90 deleted the fix/rocky-ai-hardening branch April 29, 2026 17:01
hugocorreia90 added a commit that referenced this pull request Apr 29, 2026
Engine 1.18.0 ships the rocky preview workflow end-to-end (#279, #280,
#281, #282), the [budget].max_bytes_scanned threshold (#288), the
audit-sweep closeout (#283, #285#287, #290#293), and the rocky-server
auth + CORS gate (#291).

Dagster 1.15.0 picks up the regenerated Pydantic models for the rocky
preview surface and ships the P1 cluster (#289) + FR-014 follow-on
(#284).

VS Code 1.10.0 regenerates TypeScript bindings for rocky preview and
RunCostSummary.total_bytes_scanned.

See per-artifact CHANGELOG entries for the full breakdown.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant