Skip to content

Google biglake catalog integration#97104

Merged
scanhex12 merged 12 commits intoClickHouse:masterfrom
scanhex12:biglake2
Feb 20, 2026
Merged

Google biglake catalog integration#97104
scanhex12 merged 12 commits intoClickHouse:masterfrom
scanhex12:biglake2

Conversation

@scanhex12
Copy link
Copy Markdown
Member

Changelog category (leave one):

  • New Feature

Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):

Google biglake catalog integration. This closes #95339

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

@clickhouse-gh
Copy link
Copy Markdown
Contributor

clickhouse-gh bot commented Feb 16, 2026

Workflow [PR], commit [102691a]

Summary:

job_name test_name status info comment
Stateless tests (arm_asan, azure, parallel) failure
03733_async_insert_not_supported FAIL cidb
00282_merging FAIL cidb
00538_datediff FAIL cidb
03166_skip_indexes_vertical_merge_1 FAIL cidb
02771_multidirectory_globs_storage_file FAIL cidb
Logical error: Tables UUID does not match after RESTART REPLICA (old: A, new: B) (STID: 4373-3c9f) FAIL cidb

@clickhouse-gh clickhouse-gh bot added the pr-feature Pull request with new product feature label Feb 16, 2026
@scanhex12
Copy link
Copy Markdown
Member Author

Demo:

:) SET allow_database_iceberg=1;

SET allow_database_iceberg = 1

Query id: 01534cff-e477-48fb-93d8-45f2b008ed06

Ok.

0 rows in set. Elapsed: 0.000 sec. 

:) CREATE DATABASE biglake_db2 
ENGINE = DataLakeCatalog('https://biglake.googleapis.com/iceberg/v1/restcatalog')
SETTINGS
    catalog_type = 'biglake',
    google_adc_client_id = '<>',
    google_adc_client_secret = '<>',
    google_adc_refresh_token = '<>',
    google_adc_quota_project_id = '<>',
    warehouse = 'gs://biglake-public-nyc-taxi-iceberg';

CREATE DATABASE biglake_db2
ENGINE = DataLakeCatalog('https://biglake.googleapis.com/iceberg/v1/restcatalog')
SETTINGS catalog_type = 'biglake', google_adc_client_id = '<>', google_adc_client_secret = '<>', google_adc_refresh_token = '<>', google_adc_quota_project_id = 'support-services-<>', warehouse = 'gs://biglake-public-nyc-taxi-iceberg'

Query id: 8edfa47b-07e8-453f-94ef-b76e29801c2e

Ok.

0 rows in set. Elapsed: 0.002 sec. 

:) show tables from biglake_db2;

SHOW TABLES FROM biglake_db2

Query id: 4b72e8a2-622d-45fa-a808-6c56d4c192a0

   ┌─name─────────────────────────┐
1. │ public_data.nyc_taxicab      │
2. │ public_data.nyc_taxicab_2021 │
   └──────────────────────────────┘

2 rows in set. Elapsed: 3.839 sec. 

:) select * from biglake_db2.`public_data.nyc_taxicab` limit 1;

SELECT *
FROM biglake_db2.`public_data.nyc_taxicab`
LIMIT 1

Query id: 94577765-3c03-465d-8e19-5bbd05a13c4c

Row 1:
──────
vendor_id:           1
pickup_datetime:     2016-11-17 15:43:00.000000
dropoff_datetime:    2016-11-17 15:43:05.000000
passenger_count:     1
trip_distance:       0
rate_code:           1
store_and_fwd_flag:  N
payment_type:        1
fare_amount:         0
extra:               0
mta_tax:             0
tip_amount:          0
tolls_amount:        0
imp_surcharge:       0
airport_fee:         ᴺᵁᴸᴸ
total_amount:        0
pickup_location_id:  236
dropoff_location_id: 236
data_file_year:      2016
data_file_month:     11

1 row in set. Elapsed: 6.716 sec. 

:) \q

@scanhex12
Copy link
Copy Markdown
Member Author

TODO: add setting to read auth values from the json file

@divanik divanik self-assigned this Feb 16, 2026
@pashandor789
Copy link
Copy Markdown

cool fix scanhex.
that's very important mop fix.

@scanhex12 scanhex12 requested a review from divanik February 18, 2026 10:49
@scanhex12
Copy link
Copy Markdown
Member Author

Demo with file:

:) CREATE DATABASE biglake_db3 
ENGINE = DataLakeCatalog('https://biglake.googleapis.com/iceberg/v1/restcatalog')
SETTINGS
    catalog_type = 'biglake',
    google_adc_credentials_file = '/Users/konstantinvedernikov/.config/gcloud/application_default_credentials.json',
    warehouse = 'gs://biglake-public-nyc-taxi-iceberg';

CREATE DATABASE biglake_db3
ENGINE = DataLakeCatalog('https://biglake.googleapis.com/iceberg/v1/restcatalog')
SETTINGS catalog_type = 'biglake', google_adc_credentials_file = '/Users/konstantinvedernikov/.config/gcloud/application_default_credentials.json', warehouse = 'gs://biglake-public-nyc-taxi-iceberg'

Query id: 8d3b5daf-c228-469c-8a55-142516c8b1d3

Ok.

0 rows in set. Elapsed: 0.001 sec. 

:) select * from biglake_db3.`public_data.nyc_taxicab` limit 1;

SELECT *
FROM biglake_db3.`public_data.nyc_taxicab`
LIMIT 1

Query id: a872cc76-4bd1-4855-9711-d3df3e341c99

Row 1:
──────
vendor_id:           1
pickup_datetime:     2016-11-17 15:43:00.000000
dropoff_datetime:    2016-11-17 15:43:05.000000
passenger_count:     1
trip_distance:       0
rate_code:           1
store_and_fwd_flag:  N
payment_type:        1
fare_amount:         0
extra:               0
mta_tax:             0
tip_amount:          0
tolls_amount:        0
imp_surcharge:       0
airport_fee:         ᴺᵁᴸᴸ
total_amount:        0
pickup_location_id:  236
dropoff_location_id: 236
data_file_year:      2016
data_file_month:     11

1 row in set. Elapsed: 4.943 sec. 

:) show tables from biglake_db3;

SHOW TABLES FROM biglake_db3

Query id: 2694fb46-af41-4090-b1d2-4d4391f1d7ae

   ┌─name─────────────────────────┐
1. │ public_data.nyc_taxicab      │
2. │ public_data.nyc_taxicab_2021 │
   └──────────────────────────────┘

2 rows in set. Elapsed: 2.964 sec. 

Copy link
Copy Markdown
Member

@divanik divanik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor changes

@scanhex12 scanhex12 requested a review from divanik February 19, 2026 11:32
@scanhex12 scanhex12 enabled auto-merge February 19, 2026 18:54
@scanhex12 scanhex12 disabled auto-merge February 20, 2026 10:09
@scanhex12 scanhex12 enabled auto-merge February 20, 2026 10:09
@scanhex12 scanhex12 added this pull request to the merge queue Feb 20, 2026
Merged via the queue into ClickHouse:master with commit a199432 Feb 20, 2026
142 of 146 checks passed
@scanhex12 scanhex12 deleted the biglake2 branch February 20, 2026 10:26
@robot-ch-test-poll4 robot-ch-test-poll4 added the pr-synced-to-cloud The PR is synced to the cloud repo label Feb 20, 2026
Algunenano added a commit to Algunenano/ClickHouse that referenced this pull request Mar 16, 2026
Add checklist item 8 (server-side file access & path traversal) to the
code review skill's risk checklist, and a corresponding blocker severity
criterion. This ensures the reviewer flags cases where user-controlled
file paths (e.g. from SQL settings) reach `ReadBufferFromFile` or
similar without `user_files_path` validation or equivalent restrictions.

Motivated by a missed finding in PR ClickHouse#97104 where
`google_adc_credentials_file` allowed arbitrary server-side file reads.

Co-Authored-By: Claude Opus 4.6 (1M context) <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-feature Pull request with new product feature pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add support for Iceberg Rest Catalog in BigLake

6 participants