feat(engine/rocky-fivetran): surface connector config metadata via SourceOutput.metadata#225
Merged
hugocorreia90 merged 1 commit intomainfrom Apr 22, 2026
Conversation
…urceOutput.metadata
Fivetran-integrated Rocky users who need to branch logic on connector
type (stock vs custom reports, schema prefix, custom tables) can now
read `SourceOutput.metadata.<field>` directly from `rocky discover`
output instead of re-parsing connector-type strings or re-calling the
Fivetran REST API.
Pipeline:
Fivetran `Connector.config` (serde_json::Value, wire-level)
-> `DiscoveredConnector.metadata` (IndexMap<String, Value>, core)
-> `SourceOutput.metadata` (IndexMap<String, Value>, CLI)
`metadata_from_connector` projects a stable, namespaced subset into the
adapter-neutral map:
* `fivetran.service` (always)
* `fivetran.connector_id` (always)
* `fivetran.schema_prefix` (when present in config)
* `fivetran.custom_tables` (when present in config)
* `fivetran.custom_reports` (renamed from wire-level `config.reports`
— the semantic rename pairs stock tables
with the user-defined reports list so
downstream consumers can branch cleanly)
Field shape is `IndexMap<String, serde_json::Value>` (not a
specific-shaped struct) so Rocky relays service-specific payloads
verbatim without modelling every Fivetran service. `IndexMap` (not
`HashMap`) keeps iteration order insertion-stable so the discover JSON
output is byte-stable across runs — important for the dagster fixture
corpus and the `codegen-drift` CI check. The field is tagged
`skip_serializing_if = "IndexMap::is_empty"` so adapters that haven't
opted in (DuckDB, Iceberg, Airbyte) produce the same wire output as
before.
Other adapters: opt in by populating the optional field; no trait
change required. `DiscoveredConnector.metadata` defaults to empty so
existing call sites compile unchanged after adding the field.
Codegen cascade (`just codegen`):
* schemas/discover.schema.json — new `metadata` property
* integrations/dagster/.../discover_schema.py — autogenerated Pydantic
* editors/vscode/src/types/generated/discover.ts — autogenerated TS
Dagster side:
* `SourceInfo.metadata: dict[str, Any]` — hand-written mirror of the
autogenerated field, defaults to `{}` so translator/component code
iterating `source.metadata.items()` doesn't need a None-guard.
* `RockyDagsterTranslator.get_metadata` forwards adapter-namespaced
keys into `AssetSpec.metadata`. String values pass through; richer
structures are JSON-encoded so Dagster's `dict[str, str]` contract
holds — consumers `json.loads()` the projected blobs as needed.
Fixture stability: `just regen-fixtures` run twice produces byte-stable
output (playground POC is DuckDB-only and never stamps metadata).
Tests: 6 new unit tests in `rocky-fivetran` (3 `metadata_from_connector`
projections, 2 end-to-end wiremock discover paths, 1 config-blob
deserialization), 1 round-trip test in `rocky-core`, 3 new Python tests
(DiscoverResult parsing, generated SourceOutput exposure,
translator forwarding + backward-compat regression guard).
7 tasks
hugocorreia90
added a commit
that referenced
this pull request
Apr 22, 2026
* chore: release engine-v1.14.0 + dagster-v1.10.0 + vscode-v1.6.4 Bumps all three artifacts to cover the 16-PR cascade since engine-v1.13.0 / dagster-v1.9.0 / vscode-v1.6.3. Details in each CHANGELOG. Engine headlines (12 PRs): - Arc 7 wave 2 wave-2 complete — cached DESCRIBE end-to-end (#223 infra, #228 reads, #230 write tap, #231 discover warm-up, #232 state controls + --cache-ttl override) - Arc 2 wave 3 complete — bytes_scanned / bytes_written on MaterializationOutput (#219 BQ, #221 Databricks, #220 Snowflake deferred doc, #222 docstring cascade). Real $ on rocky cost for BQ + Databricks - FR-005 Unity Catalog workspace-binding reconcile (#226) - FR-002 Fivetran connector metadata via SourceOutput.metadata (#225) - Housekeeping: compute_backoff dedup into rocky_core::retry (#217) Dagster headlines (4 PRs): - FR-001 RockyComponent Pipes execution mode + FR-006 strict doctor on RockyResource startup (#224) - FR-003 RockyResource.state_health() (#227) + FR follow-up threading doctor(check=state_rw) for sub-second probes (#229) - RockyResource.cost() wiring + fixture (#218) VS Code: regenerated TS bindings for engine 1.14.0 type additions. No extension feature changes. * chore(integrations/dagster): regenerate test fixtures for engine 1.14.0 36 fixtures picked up the new engine version string in their top-level "version" field. No schema changes — just the version bump.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
SourceOutput.metadata.<field>directly fromrocky discoveroutput instead of re-parsing connector-type strings or re-calling the Fivetran REST API.metadata: IndexMap<String, serde_json::Value>field added to bothrocky_core::source::DiscoveredConnectorandrocky_cli::output::SourceOutput. Populated per-adapter; other adapters (Airbyte, DuckDB, Iceberg) opt in by stamping keys — no trait change required.metadata_from_connectorprojects a stable, namespaced subset of the Fivetran connector config into the adapter-neutral map underfivetran.*keys.Field shape
SourceOutput.metadata: IndexMap<String, serde_json::Value>with#[serde(default, skip_serializing_if = "IndexMap::is_empty")].IndexMap(notHashMap) so iteration order is insertion-stable — the discover JSON output stays byte-stable across runs, which matters for the dagster fixture corpus and thecodegen-driftCI check. Mirrors the existingSourceOutput.componentsprecedent.skip_serializing_if = "IndexMap::is_empty"so adapters that haven't opted in (DuckDB, Iceberg, Airbyte) produce the same wire output as before. Verified via two consecutivejust regen-fixturesruns — the DuckDB playground fixture is byte-identical.serde_json::Value(not a specific struct) so Rocky relays service-specific payloads verbatim without modelling every Fivetran service. Classification is downstream.Fivetran projection keys
fivetran.serviceConnector.servicefivetran.connector_idConnector.idfivetran.schema_prefixconfig.schema_prefixfivetran.custom_tablesconfig.custom_tablesfivetran.custom_reportsconfig.reportsreports— pairs the user-defined reports list with the stock-tables split downstream consumers actually needCodegen cascade (
just codegen)Regenerated with the new field:
schemas/discover.schema.json— newmetadataproperty underSourceOutputintegrations/dagster/src/dagster_rocky/types_generated/discover_schema.py— autogenerated Pydantic (metadata: dict[str, Any] | None = None)editors/vscode/src/types/generated/discover.ts— autogenerated TypeScript interfaceHand-written Python mirror
SourceInfoinintegrations/dagster/src/dagster_rocky/types.pyis the hand-written Pydantic model thatRockyComponent/ translator / sensor code consumes (parallel to the autogeneratedSourceOutput). Themetadatafield is mirrored there asdict[str, Any] = Field(default_factory=dict)so translator/component code iteratingsource.metadata.items()doesn't need a None-guard. Note: the hand-written and autogenerated empty-case defaults diverge slightly ({}vsNone) — the hand-written one is the intended consumer surface per the FR's downstream pattern.RockyDagsterTranslator.get_metadataforwards adapter-namespaced keys intoAssetSpec.metadata: string values pass through, richer structures are JSON-encoded so Dagster'sdict[str, str]contract holds (consumersjson.loads()the projected blobs as needed).Test plan
cargo test -p rocky-fivetran— 26 unit tests + 11 wiremock tests pass. 6 new unit tests added (3metadata_from_connectorprojections, 2 end-to-end wiremock discover paths, 1 config-blob deserialization).cargo test -p rocky-core— 207 tests pass. New round-trip test forDiscoveredConnector.metadataserialization stability.cargo test -p rocky-cli— 1014 tests pass. Existing discover tests unchanged.cargo test --workspace— no regressions across the 20-crate workspace.cargo clippy --workspace --all-targets -- -D warnings— clean.cargo fmt --all --check— clean.just codegen— regenerates three files (schema, Pydantic, TS) cleanly.just regen-fixtures— byte-stable across two consecutive runs (same MD5 fordiscover.json).uv run pytestinintegrations/dagster/— 316 tests pass. 3 new tests added (DiscoverResultparsing with metadata, generatedSourceOutputmetadata exposure, translator forwarding + backward-compat regression guard).uv run ruff check+ruff format --check— clean.npm run compileineditors/vscode/— clean TS build.Forward pointer
Other adapters (Airbyte, DuckDB, Iceberg, Snowflake sources, custom) opt in by populating the optional field under their own namespaced keys (
airbyte.*,snowflake.share_id,bigquery.labels, ...). No trait change is required — theDiscoveryAdapter::discoverreturn type already carries the metadata map.