Skip to content

fix: ProcessAdapter concurrency + percent-encode URL path components in source adapters#290

Merged
hugocorreia90 merged 1 commit intomainfrom
fix/process-adapter-and-url-injection
Apr 29, 2026
Merged

fix: ProcessAdapter concurrency + percent-encode URL path components in source adapters#290
hugocorreia90 merged 1 commit intomainfrom
fix/process-adapter-and-url-injection

Conversation

@hugocorreia90
Copy link
Copy Markdown
Contributor

Summary

Two related defensive fixes to source-adapter / SPI infrastructure.

ProcessAdapter::call is not safe for concurrent use. It acquires the stdin Mutex to write the request, releases it, then acquires stdout to read the response. Two concurrent callers can therefore interleave on the JSON-RPC pipe; the per-call id-mismatch check turns the race into a hard error rather than silent corruption, but the entire process-adapter SPI is effectively single-threaded per instance — undocumented. Adds a call_lock: Mutex<()> held across the entire write→read pair so calls serialize cleanly, and documents the constraint on the function. The id-mismatch check stays as a defence against a misbehaving adapter.

Source-adapter HTTP clients leaked user-supplied IDs into URL paths via format!() without percent-encoding. Affects:

  • rocky-fivetrangroup_id and connector_id in /v1/groups/{}/connectors, /v1/connectors/{}, /v1/connectors/{}/schemas (interpolated in connector.rs / schema.rs; the client.rs GET helper just stitches the path onto base_url).
  • rocky-airbyteid in GET /v1/connections/{}.
  • rocky-iceberg — namespace parts in GET /v1/namespaces/{}/tables. The existing replace('.', "%1F") is the spec separator, not encoding; the parts themselves were unencoded.

Fix: a small percent-encoding helper in each crate using RFC 3986 unreserved chars minus . (so .. cannot traverse, while ordinary identifiers like my_id-v2 round-trip cleanly). For Iceberg the order is split → encode each part → join with %1F; encoding after the join would re-encode the separator's %.

Adds percent-encoding to [workspace.dependencies] and to each of the three source-adapter crates.

Test plan

  • cd engine && cargo test -p rocky-adapter-sdk -p rocky-fivetran -p rocky-airbyte -p rocky-iceberg --features rocky-fivetran/test-support — green
  • cd engine && cargo clippy -p rocky-adapter-sdk -p rocky-fivetran -p rocky-airbyte -p rocky-iceberg --all-targets --features rocky-fivetran/test-support -- -D warnings — clean
  • cd engine && cargo fmt --check — clean
  • cd engine && cargo build --workspace — clean

New tests:

  • process::tests::test_concurrent_calls_do_not_swap_ids (cfg(unix)) spawns a Python-based JSON-RPC echo adapter and tokio::join!s two calls with distinct tags; asserts each caller receives its own tag/id back. Skips silently if python3 is unavailable.
  • wiremock_tests::test_list_connectors_percent_encodes_group_id, test_get_connector_percent_encodes_connector_id, test_get_schema_config_percent_encodes_connector_id — wire-level Fivetran tests where the mock matches only on the encoded path.
  • Airbyte unit test asserting get_connection's URL escapes /, ?, #, .., and %.
  • Iceberg unit tests covering each metacharacter per part, the ..-via-split case, and the multi-level "encode-then-join" ordering invariant.

…in source adapters

ProcessAdapter::call previously released the stdin lock before acquiring
stdout, so two concurrent callers could interleave on the JSON-RPC pipe.
The per-call id-mismatch guard turned the race into a hard error rather
than corruption, but it also made the entire process-adapter SPI
effectively single-threaded per instance — undocumented. Add a
`call_lock: Mutex<()>` held across the write→read pair so concurrent
calls serialize cleanly, and document on the function that the
underlying child process is single-threaded. A `cfg(unix)` test spawns a
small Python JSON-RPC echo and asserts two `tokio::join!`'d calls each
receive their own response.

The Fivetran, Airbyte, and Iceberg HTTP clients interpolated
caller-controlled IDs (connector_id, group_id, connection_id, namespace
parts) into URL paths via `format!()` without percent-encoding,
exposing path-traversal and query-string injection. Add a
percent-encoding helper in each crate using RFC 3986 unreserved chars
minus `.` so `..` cannot traverse, and apply it at every interpolation
site. For Iceberg, the existing `replace('.', "%1F")` separator logic
is preserved by splitting on `.` first, encoding each part, then
joining with the literal `%1F` — encoding after the join would break
the spec separator.

Adds `percent-encoding` to the workspace deps and to the three source
adapter crates. Includes regression tests at the wiremock layer for
Fivetran (group_id, connector_id, schema endpoint) and unit tests for
Airbyte and Iceberg verifying that `/`, `?`, `#`, `..`, and `%` are
correctly escaped.
@hugocorreia90 hugocorreia90 merged commit 0b92feb into main Apr 29, 2026
12 checks passed
@hugocorreia90 hugocorreia90 deleted the fix/process-adapter-and-url-injection branch April 29, 2026 18:16
hugocorreia90 added a commit that referenced this pull request Apr 29, 2026
Engine 1.18.0 ships the rocky preview workflow end-to-end (#279, #280,
#281, #282), the [budget].max_bytes_scanned threshold (#288), the
audit-sweep closeout (#283, #285#287, #290#293), and the rocky-server
auth + CORS gate (#291).

Dagster 1.15.0 picks up the regenerated Pydantic models for the rocky
preview surface and ships the P1 cluster (#289) + FR-014 follow-on
(#284).

VS Code 1.10.0 regenerates TypeScript bindings for rocky preview and
RunCostSummary.total_bytes_scanned.

See per-artifact CHANGELOG entries for the full breakdown.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant