feat(bigquery): enrich external table metadata with source format, URIs, compression, and max bad records#16348
Merged
Merged
Conversation
…Is, compression, and max bad records
Contributor
|
Linear: ING-1748 |
Contributor
|
thanks for the contribution @EladLeev ! This is a great contribution. @sgomezvillamor want to take a stab at it? |
Contributor
sgomezvillamor
left a comment
There was a problem hiding this comment.
LGTM
Just left a couple of suggestions
…ss and parametrise tests
Contributor
Author
Thank you for the review! applied the changes now 🙏 |
Contributor
Author
|
@sgomezvillamor @gabe-lyons mind to have another look? 😇 |
sgomezvillamor
approved these changes
Mar 2, 2026
Contributor
sgomezvillamor
left a comment
There was a problem hiding this comment.
LGTM
thanks for the contrib
Contributor
|
Your PR has been assigned to sergio.gomez for review (ING-1748). |
Contributor
|
Hi @sgomezvillamor looks like everything is green here (CI + approvals) and it’s been waiting for a bit. Happy to merge it if you’d like, or feel free to take it 🙂 |
8c565d8
into
datahub-project:master
52 of 54 checks passed
david-leifker
pushed a commit
that referenced
this pull request
May 27, 2026
- fix(ingestion): pin sqlglotc (#16614) - fix(kafka): make replication factor configurable per topic (#16585) - docs(ingestion): add request-connector page to fix dead link on Integrations page (#16617) - docs: update announcement bar for March Town Hall (#16610) - feat(dbt): Extract and emit stats from catalog.json (#16044) - feat(bigquery): enrich external table metadata with source format, URIs, compression, and max bad records (#16348) - fix(ui): Add fixes for ingestion, selecting glossaries in policies, and data product icons (#16627) - feat(ingest/glue): Iceberg lineage (#16562) - feat(powerbi): add external URL for Power BI App entities (#16572) - fix(ingestion): bump authlib to >=1.6.9 for JWE RSA1_5 padding oracle… (#16633) - fix(cli): Add gql files to the wheel build (#16637) - docs(remote_executor/k8s): clean up source secret instructions to match EKS (#16634)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Hey,
BigQuery external tables were already detected and tagged with the
EXTERNAL_TABLEsubtype on DataHub, but their specific metadata was missing.This PR enriches external tables with the following
customPropertiesfrom the BQ tables:external_source_format- e.g.PARQUET,CSV,ORCexternal_source_uris- the GCS paths the table reads fromexternal_compression- e.g.GZIPexternal_max_bad_records- tolerance for malformed rowsAll values are parsed from the DDL, which is already fetched for every table (so no extra API calls, which is nice).
Unit tests, linting and manual tests against a real BQ project and DataHub on Docker pass.
Thanks!