fix(rocky-ai): credential redaction + testgen path canonicalization + token budget by hugocorreia90 · Pull Request #287 · rocky-data/rocky

hugocorreia90 · 2026-04-29T16:55:48Z

Three issues from the rocky-ai maintenance sweep, landed together since they all touch the same crate and the third one cascades into rocky-core and the schema codegen.

Issue 6a — credential redaction (HIGH/SEC)

AiConfig stored api_key: String while deriving Debug + Serialize. Any future tracing::error!(?config) (or tracing::debug!(?self.config) inside LlmClient) would write the raw API key straight into log files.

Fix: wrap api_key in rocky_core::redacted::RedactedString, which masks Debug and Display to *** by design. The raw value is exposed only at the outbound HTTP boundary in call_anthropic via RedactedString::expose(). Added a debug_does_not_leak_api_key regression test that asserts format!("{config:?}") never contains the secret and always contains ***.

Issue 6b — testgen path traversal (HIGH/SEC)

testgen::save_tests built file paths by joining tests_dir with format!("{model}_{name}.sql", name = assertion.name.replace(' ', "_").to_lowercase()). The LLM controls assertion.name end-to-end, and .. / / / \ slipped straight through replace(' ', "_") — direct path traversal from model output to the filesystem.

Fix: introduced a strict allow-list (is_safe_path_component) — ASCII alphanumeric, _, -, non-empty, no leading dot — and validated both the model name and every assertion name before any Path::join. Unsafe input rejects with io::ErrorKind::InvalidInput. Added the requested assertion.name = "../../../etc/passwd" regression test plus four siblings covering absolute paths, dotted names, unsafe model names, and the happy path.

Issue 6c — max_tokens hardcoded + unbounded retry budget (MED/PERF + cost)

call_anthropic hard-coded max_tokens: 4096 and generate_model retried 3× with no aggregate token / cost / time ceiling. A pathological run could pay for 3 × 4096 output tokens before fail-stopping.

Fix:

New [ai] max_tokens = N config block in rocky.toml (default 4096, preserving prior behaviour) — backed by a new AiSection struct wired into RockyConfig.
AiConfig::max_tokens is sourced from that config block in make_client and used both as the per-request cap on the Anthropic Messages API and as the cumulative output-token budget across retries.
The compile-verify retry loop in generate_model now sums response.output_tokens across attempts; when the running total exceeds the budget, it fail-stops with a new AiError::TokenBudgetExceeded variant instead of issuing another retry.
Documented the knob in docs/src/content/docs/reference/configuration.md (new ## [ai] section) and the docs/src/content/docs/guides/ai-features.md setup guide.

The Anthropic API caps each individual response at the per-request max_tokens, so attempt 1 alone can never trip the budget. The fail-stop kicks in only on attempt ≥ 2 with runaway accumulation — preserving typical-case behaviour while bounding the worst case.

Codegen cascade

Adding AiSection to RockyConfig updates the autogenerated rocky_project.schema.json, the VS Code project schema and TypeScript types, and the Dagster Pydantic model. All regenerated via just codegen and committed in this PR so codegen-drift.yml stays green.

Test plan

cd engine && cargo test -p rocky-ai — 26 passed (includes new Debug-leak + path-traversal regression tests)
cd engine && cargo test -p rocky-core — 1122 passed (covers new AiSection field on RockyConfig)
cd engine && cargo test -p rocky-cli — 296 passed
cd engine && cargo clippy -p rocky-ai --all-targets -- -D warnings — clean
cd engine && cargo clippy --all-targets -- -D warnings — clean
cd engine && cargo fmt --check — clean
just codegen — regenerated rocky_project.schema.json, editors/vscode/schemas/rocky-project.schema.json, editors/vscode/src/types/generated/rocky_project.ts, integrations/dagster/src/dagster_rocky/types_generated/rocky_project_schema.py
cd integrations/dagster && uv run pytest — 458 passed

Three issues from the rocky-ai maintenance sweep. 6a (HIGH/SEC): AiConfig stored api_key as a raw String while deriving Debug + Serialize. Any future tracing call that included `?config` would exfiltrate the raw key into log files. Wrap api_key in rocky_core::redacted::RedactedString so `{:?}` prints `***`; expose the raw value only at the outbound HTTP header. Add a regression test that asserts Debug output never contains the raw secret. 6b (HIGH/SEC): testgen::save_tests built file paths from LLM-controlled assertion.name without sanitizing `..`, `/`, `\\`, leading dots, or empty strings — direct path traversal from model output to the local filesystem. Validate both the model name and each assertion name against a strict allow-list (ASCII alphanumeric + `_` + `-`, non-empty, non-dotted) before constructing the output path. Reject with io::ErrorKind::InvalidInput on any unsafe input. Add the requested `assertion.name = "../../../etc/passwd"` regression test plus four sibling tests for absolute paths, dotted names, unsafe model names, and the happy path. 6c (MED/PERF + cost): max_tokens was hard-coded to 4096 and the compile-verify retry loop had no global cost ceiling — three retries could each return up to max_tokens, billing N x max_tokens before failing. Introduce a new `[ai] max_tokens = N` knob in rocky.toml (default 4096, preserving existing behaviour). Use it both as the per-request cap on the Anthropic Messages API and as a cumulative output-token budget tracked across retries; when the running total exceeds the budget, fail-stop with a new TokenBudgetExceeded error instead of issuing another retry. Document the knob in the AI section of the configuration reference and the AI features guide. Test plan: - cargo test -p rocky-ai (26 passed, including the new path-traversal and Debug-leak regression tests) - cargo test -p rocky-core (1122 + sibling test suites passed; covers the new AiSection field on RockyConfig) - cargo test -p rocky-cli (296 passed) - cargo clippy --all-targets -- -D warnings (clean) - cargo fmt --check (clean) - just codegen (regenerated rocky_project schema, vscode TS types, dagster Pydantic model) - cd integrations/dagster && uv run pytest (458 passed)

Engine 1.18.0 ships the rocky preview workflow end-to-end (#279, #280, #281, #282), the [budget].max_bytes_scanned threshold (#288), the audit-sweep closeout (#283, #285–#287, #290–#293), and the rocky-server auth + CORS gate (#291). Dagster 1.15.0 picks up the regenerated Pydantic models for the rocky preview surface and ships the P1 cluster (#289) + FR-014 follow-on (#284). VS Code 1.10.0 regenerates TypeScript bindings for rocky preview and RunCostSummary.total_bytes_scanned. See per-artifact CHANGELOG entries for the full breakdown.

hugocorreia90 merged commit ba4aeda into main Apr 29, 2026
15 checks passed

hugocorreia90 deleted the fix/rocky-ai-hardening branch April 29, 2026 17:01

hugocorreia90 mentioned this pull request Apr 29, 2026

chore: release engine-v1.18.0 + dagster-v1.15.0 + vscode-v1.10.0 #294

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(rocky-ai): credential redaction + testgen path canonicalization + token budget#287

fix(rocky-ai): credential redaction + testgen path canonicalization + token budget#287
hugocorreia90 merged 1 commit intomainfrom
fix/rocky-ai-hardening

hugocorreia90 commented Apr 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

hugocorreia90 commented Apr 29, 2026

Issue 6a — credential redaction (HIGH/SEC)

Issue 6b — testgen path traversal (HIGH/SEC)

Issue 6c — max_tokens hardcoded + unbounded retry budget (MED/PERF + cost)

Codegen cascade

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant