fix: ProcessAdapter concurrency + percent-encode URL path components in source adapters#290
Merged
hugocorreia90 merged 1 commit intomainfrom Apr 29, 2026
Merged
Conversation
…in source adapters
ProcessAdapter::call previously released the stdin lock before acquiring
stdout, so two concurrent callers could interleave on the JSON-RPC pipe.
The per-call id-mismatch guard turned the race into a hard error rather
than corruption, but it also made the entire process-adapter SPI
effectively single-threaded per instance — undocumented. Add a
`call_lock: Mutex<()>` held across the write→read pair so concurrent
calls serialize cleanly, and document on the function that the
underlying child process is single-threaded. A `cfg(unix)` test spawns a
small Python JSON-RPC echo and asserts two `tokio::join!`'d calls each
receive their own response.
The Fivetran, Airbyte, and Iceberg HTTP clients interpolated
caller-controlled IDs (connector_id, group_id, connection_id, namespace
parts) into URL paths via `format!()` without percent-encoding,
exposing path-traversal and query-string injection. Add a
percent-encoding helper in each crate using RFC 3986 unreserved chars
minus `.` so `..` cannot traverse, and apply it at every interpolation
site. For Iceberg, the existing `replace('.', "%1F")` separator logic
is preserved by splitting on `.` first, encoding each part, then
joining with the literal `%1F` — encoding after the join would break
the spec separator.
Adds `percent-encoding` to the workspace deps and to the three source
adapter crates. Includes regression tests at the wiremock layer for
Fivetran (group_id, connector_id, schema endpoint) and unit tests for
Airbyte and Iceberg verifying that `/`, `?`, `#`, `..`, and `%` are
correctly escaped.
5 tasks
hugocorreia90
added a commit
that referenced
this pull request
Apr 29, 2026
Engine 1.18.0 ships the rocky preview workflow end-to-end (#279, #280, #281, #282), the [budget].max_bytes_scanned threshold (#288), the audit-sweep closeout (#283, #285–#287, #290–#293), and the rocky-server auth + CORS gate (#291). Dagster 1.15.0 picks up the regenerated Pydantic models for the rocky preview surface and ships the P1 cluster (#289) + FR-014 follow-on (#284). VS Code 1.10.0 regenerates TypeScript bindings for rocky preview and RunCostSummary.total_bytes_scanned. See per-artifact CHANGELOG entries for the full breakdown.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two related defensive fixes to source-adapter / SPI infrastructure.
ProcessAdapter::callis not safe for concurrent use. It acquires the stdinMutexto write the request, releases it, then acquires stdout to read the response. Two concurrent callers can therefore interleave on the JSON-RPC pipe; the per-callid-mismatch check turns the race into a hard error rather than silent corruption, but the entire process-adapter SPI is effectively single-threaded per instance — undocumented. Adds acall_lock: Mutex<()>held across the entire write→read pair so calls serialize cleanly, and documents the constraint on the function. The id-mismatch check stays as a defence against a misbehaving adapter.Source-adapter HTTP clients leaked user-supplied IDs into URL paths via
format!()without percent-encoding. Affects:rocky-fivetran—group_idandconnector_idin/v1/groups/{}/connectors,/v1/connectors/{},/v1/connectors/{}/schemas(interpolated inconnector.rs/schema.rs; theclient.rsGET helper just stitches the path ontobase_url).rocky-airbyte—idinGET /v1/connections/{}.rocky-iceberg— namespace parts inGET /v1/namespaces/{}/tables. The existingreplace('.', "%1F")is the spec separator, not encoding; the parts themselves were unencoded.Fix: a small percent-encoding helper in each crate using RFC 3986 unreserved chars minus
.(so..cannot traverse, while ordinary identifiers likemy_id-v2round-trip cleanly). For Iceberg the order is split → encode each part → join with%1F; encoding after the join would re-encode the separator's%.Adds
percent-encodingto[workspace.dependencies]and to each of the three source-adapter crates.Test plan
cd engine && cargo test -p rocky-adapter-sdk -p rocky-fivetran -p rocky-airbyte -p rocky-iceberg --features rocky-fivetran/test-support— greencd engine && cargo clippy -p rocky-adapter-sdk -p rocky-fivetran -p rocky-airbyte -p rocky-iceberg --all-targets --features rocky-fivetran/test-support -- -D warnings— cleancd engine && cargo fmt --check— cleancd engine && cargo build --workspace— cleanNew tests:
process::tests::test_concurrent_calls_do_not_swap_ids(cfg(unix)) spawns a Python-based JSON-RPC echo adapter andtokio::join!s twocalls with distinct tags; asserts each caller receives its own tag/id back. Skips silently ifpython3is unavailable.wiremock_tests::test_list_connectors_percent_encodes_group_id,test_get_connector_percent_encodes_connector_id,test_get_schema_config_percent_encodes_connector_id— wire-level Fivetran tests where the mock matches only on the encoded path.get_connection's URL escapes/,?,#,.., and%...-via-split case, and the multi-level "encode-then-join" ordering invariant.