Skip to content

feat(engine/rocky-fivetran): surface connector config metadata via SourceOutput.metadata#225

Merged
hugocorreia90 merged 1 commit intomainfrom
feat/fivetran-adapter-connector-metadata
Apr 22, 2026
Merged

feat(engine/rocky-fivetran): surface connector config metadata via SourceOutput.metadata#225
hugocorreia90 merged 1 commit intomainfrom
feat/fivetran-adapter-connector-metadata

Conversation

@hugocorreia90
Copy link
Copy Markdown
Contributor

Summary

  • Fivetran-integrated Rocky users who need to branch logic on connector type (stock vs custom reports, schema prefix, custom tables) can now read SourceOutput.metadata.<field> directly from rocky discover output instead of re-parsing connector-type strings or re-calling the Fivetran REST API.
  • Adapter-neutral metadata: IndexMap<String, serde_json::Value> field added to both rocky_core::source::DiscoveredConnector and rocky_cli::output::SourceOutput. Populated per-adapter; other adapters (Airbyte, DuckDB, Iceberg) opt in by stamping keys — no trait change required.
  • The Fivetran adapter is the first populator. metadata_from_connector projects a stable, namespaced subset of the Fivetran connector config into the adapter-neutral map under fivetran.* keys.

Field shape

SourceOutput.metadata: IndexMap<String, serde_json::Value> with #[serde(default, skip_serializing_if = "IndexMap::is_empty")].

  • IndexMap (not HashMap) so iteration order is insertion-stable — the discover JSON output stays byte-stable across runs, which matters for the dagster fixture corpus and the codegen-drift CI check. Mirrors the existing SourceOutput.components precedent.
  • skip_serializing_if = "IndexMap::is_empty" so adapters that haven't opted in (DuckDB, Iceberg, Airbyte) produce the same wire output as before. Verified via two consecutive just regen-fixtures runs — the DuckDB playground fixture is byte-identical.
  • Values are opaque serde_json::Value (not a specific struct) so Rocky relays service-specific payloads verbatim without modelling every Fivetran service. Classification is downstream.

Fivetran projection keys

Key Source Notes
fivetran.service Connector.service Always populated
fivetran.connector_id Connector.id Always populated
fivetran.schema_prefix config.schema_prefix Present when config has it
fivetran.custom_tables config.custom_tables Present when config has it
fivetran.custom_reports config.reports Semantic rename from wire name reports — pairs the user-defined reports list with the stock-tables split downstream consumers actually need

Codegen cascade (just codegen)

Regenerated with the new field:

  • schemas/discover.schema.json — new metadata property under SourceOutput
  • integrations/dagster/src/dagster_rocky/types_generated/discover_schema.py — autogenerated Pydantic (metadata: dict[str, Any] | None = None)
  • editors/vscode/src/types/generated/discover.ts — autogenerated TypeScript interface

Hand-written Python mirror

SourceInfo in integrations/dagster/src/dagster_rocky/types.py is the hand-written Pydantic model that RockyComponent / translator / sensor code consumes (parallel to the autogenerated SourceOutput). The metadata field is mirrored there as dict[str, Any] = Field(default_factory=dict) so translator/component code iterating source.metadata.items() doesn't need a None-guard. Note: the hand-written and autogenerated empty-case defaults diverge slightly ({} vs None) — the hand-written one is the intended consumer surface per the FR's downstream pattern.

RockyDagsterTranslator.get_metadata forwards adapter-namespaced keys into AssetSpec.metadata: string values pass through, richer structures are JSON-encoded so Dagster's dict[str, str] contract holds (consumers json.loads() the projected blobs as needed).

Test plan

  • cargo test -p rocky-fivetran — 26 unit tests + 11 wiremock tests pass. 6 new unit tests added (3 metadata_from_connector projections, 2 end-to-end wiremock discover paths, 1 config-blob deserialization).
  • cargo test -p rocky-core — 207 tests pass. New round-trip test for DiscoveredConnector.metadata serialization stability.
  • cargo test -p rocky-cli — 1014 tests pass. Existing discover tests unchanged.
  • cargo test --workspace — no regressions across the 20-crate workspace.
  • cargo clippy --workspace --all-targets -- -D warnings — clean.
  • cargo fmt --all --check — clean.
  • just codegen — regenerates three files (schema, Pydantic, TS) cleanly.
  • just regen-fixtures — byte-stable across two consecutive runs (same MD5 for discover.json).
  • uv run pytest in integrations/dagster/ — 316 tests pass. 3 new tests added (DiscoverResult parsing with metadata, generated SourceOutput metadata exposure, translator forwarding + backward-compat regression guard).
  • uv run ruff check + ruff format --check — clean.
  • npm run compile in editors/vscode/ — clean TS build.

Forward pointer

Other adapters (Airbyte, DuckDB, Iceberg, Snowflake sources, custom) opt in by populating the optional field under their own namespaced keys (airbyte.*, snowflake.share_id, bigquery.labels, ...). No trait change is required — the DiscoveryAdapter::discover return type already carries the metadata map.

…urceOutput.metadata

Fivetran-integrated Rocky users who need to branch logic on connector
type (stock vs custom reports, schema prefix, custom tables) can now
read `SourceOutput.metadata.<field>` directly from `rocky discover`
output instead of re-parsing connector-type strings or re-calling the
Fivetran REST API.

Pipeline:
  Fivetran `Connector.config` (serde_json::Value, wire-level)
    -> `DiscoveredConnector.metadata` (IndexMap<String, Value>, core)
    -> `SourceOutput.metadata`         (IndexMap<String, Value>, CLI)

`metadata_from_connector` projects a stable, namespaced subset into the
adapter-neutral map:

  * `fivetran.service`        (always)
  * `fivetran.connector_id`   (always)
  * `fivetran.schema_prefix`  (when present in config)
  * `fivetran.custom_tables`  (when present in config)
  * `fivetran.custom_reports` (renamed from wire-level `config.reports`
                               — the semantic rename pairs stock tables
                               with the user-defined reports list so
                               downstream consumers can branch cleanly)

Field shape is `IndexMap<String, serde_json::Value>` (not a
specific-shaped struct) so Rocky relays service-specific payloads
verbatim without modelling every Fivetran service. `IndexMap` (not
`HashMap`) keeps iteration order insertion-stable so the discover JSON
output is byte-stable across runs — important for the dagster fixture
corpus and the `codegen-drift` CI check. The field is tagged
`skip_serializing_if = "IndexMap::is_empty"` so adapters that haven't
opted in (DuckDB, Iceberg, Airbyte) produce the same wire output as
before.

Other adapters: opt in by populating the optional field; no trait
change required. `DiscoveredConnector.metadata` defaults to empty so
existing call sites compile unchanged after adding the field.

Codegen cascade (`just codegen`):
  * schemas/discover.schema.json           — new `metadata` property
  * integrations/dagster/.../discover_schema.py — autogenerated Pydantic
  * editors/vscode/src/types/generated/discover.ts — autogenerated TS

Dagster side:
  * `SourceInfo.metadata: dict[str, Any]` — hand-written mirror of the
    autogenerated field, defaults to `{}` so translator/component code
    iterating `source.metadata.items()` doesn't need a None-guard.
  * `RockyDagsterTranslator.get_metadata` forwards adapter-namespaced
    keys into `AssetSpec.metadata`. String values pass through; richer
    structures are JSON-encoded so Dagster's `dict[str, str]` contract
    holds — consumers `json.loads()` the projected blobs as needed.

Fixture stability: `just regen-fixtures` run twice produces byte-stable
output (playground POC is DuckDB-only and never stamps metadata).

Tests: 6 new unit tests in `rocky-fivetran` (3 `metadata_from_connector`
projections, 2 end-to-end wiremock discover paths, 1 config-blob
deserialization), 1 round-trip test in `rocky-core`, 3 new Python tests
(DiscoverResult parsing, generated SourceOutput exposure,
translator forwarding + backward-compat regression guard).
@hugocorreia90 hugocorreia90 merged commit 60fe354 into main Apr 22, 2026
14 of 15 checks passed
@hugocorreia90 hugocorreia90 deleted the feat/fivetran-adapter-connector-metadata branch April 22, 2026 16:17
hugocorreia90 added a commit that referenced this pull request Apr 22, 2026
* chore: release engine-v1.14.0 + dagster-v1.10.0 + vscode-v1.6.4

Bumps all three artifacts to cover the 16-PR cascade since engine-v1.13.0
/ dagster-v1.9.0 / vscode-v1.6.3. Details in each CHANGELOG.

Engine headlines (12 PRs):
- Arc 7 wave 2 wave-2 complete — cached DESCRIBE end-to-end
  (#223 infra, #228 reads, #230 write tap, #231 discover warm-up,
  #232 state controls + --cache-ttl override)
- Arc 2 wave 3 complete — bytes_scanned / bytes_written on
  MaterializationOutput (#219 BQ, #221 Databricks, #220 Snowflake
  deferred doc, #222 docstring cascade). Real $ on rocky cost for
  BQ + Databricks
- FR-005 Unity Catalog workspace-binding reconcile (#226)
- FR-002 Fivetran connector metadata via SourceOutput.metadata (#225)
- Housekeeping: compute_backoff dedup into rocky_core::retry (#217)

Dagster headlines (4 PRs):
- FR-001 RockyComponent Pipes execution mode + FR-006 strict doctor
  on RockyResource startup (#224)
- FR-003 RockyResource.state_health() (#227) + FR follow-up threading
  doctor(check=state_rw) for sub-second probes (#229)
- RockyResource.cost() wiring + fixture (#218)

VS Code: regenerated TS bindings for engine 1.14.0 type additions.
No extension feature changes.

* chore(integrations/dagster): regenerate test fixtures for engine 1.14.0

36 fixtures picked up the new engine version string in their top-level
"version" field. No schema changes — just the version bump.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant