Reproduction was created on base of the https://github.com/Altinity/clickhouse-sql-examples/tree/main/iceberg/iceberg-with-minio
docker compose up -d
docker restart iceberg-clickhouse-integration-rest-1
It creates iceberg table, adds 4 rows into it, drop the table, recreates it, adds one row to recreated table.
python3 iceberg_recreate_table.py
docker exec -it clickhouse bash
clickhouse client
When I drop the table and recreate it (with same name), I still see old data when reading it from clickhouse.
Reproduction:
- Bring up containers
- Run the script
Connect to the catalog
Create namespace iceberg
--Already exists
List namespaces
('iceberg',)
List tables
('iceberg', 'names') <class 'tuple'>
Dropping names table if it exists
Add four rows of data to iceberg.names
Table read with pyiceberg:
name age
0 Alice 17.0
1 Alice 19.0
2 Bob 20.0
3 David 22.0
- Connect to clickhouse client (in other terminal window) in container and read the same table using Iceberg engine
DROP DATABASE IF EXISTS datalake;
SET allow_experimental_database_iceberg=true;
CREATE DATABASE datalake
ENGINE = Iceberg('http://rest:8181/v1', 'minio', 'minio123')
SETTINGS catalog_type = 'rest', storage_endpoint = 'http://minio:9000/warehouse', warehouse = 'iceberg' ;
SELECT * FROM datalake.`iceberg.names`;
Output:
┌─name──┬─age─┐
1. │ David │ 22 │
└───────┴─────┘
┌─name──┬─age─┐
2. │ Alice │ 17 │
3. │ Alice │ 19 │
└───────┴─────┘
┌─name─┬─age─┐
4. │ Bob │ 20 │
└──────┴─────┘
4 rows in set. Elapsed: 0.026 sec.
- Return to script window and press enter
Output:
Drop table iceberg.names
Recreate the table iceberg.names
Table read with pyiceberg:
Empty DataFrame
Columns: [name, age]
Index: []
- Again connect to clickhouse client (in other terminal window) in container and read the same table using Iceberg engine
DROP DATABASE IF EXISTS datalake;
SET allow_experimental_database_iceberg=true;
CREATE DATABASE datalake
ENGINE = Iceberg('http://rest:8181/v1', 'minio', 'minio123')
SETTINGS catalog_type = 'rest', storage_endpoint = 'http://minio:9000/warehouse', warehouse = 'iceberg' ;
SELECT * FROM datalake.`iceberg.names`;
Output:
┌─name─┬─age─┐
1. │ Bob │ 20 │
└──────┴─────┘
┌─name──┬─age─┐
2. │ Alice │ 17 │
3. │ Alice │ 19 │
└───────┴─────┘
┌─name──┬─age─┐
4. │ David │ 22 │
└───────┴─────┘
4 rows in set. Elapsed: 0.019 sec.
Here I expect to see empty table.
- Return to script window and press enter
Output:
Script paused. Press Enter to continue...
Add one row of data to iceberg.names
Table read with pyiceberg:
name age
0 Simon 30.0
- Again connect to clickhouse client (in other terminal window) in container and read the same table using Iceberg engine
DROP DATABASE IF EXISTS datalake;
SET allow_experimental_database_iceberg=true;
CREATE DATABASE datalake
ENGINE = Iceberg('http://rest:8181/v1', 'minio', 'minio123')
SETTINGS catalog_type = 'rest', storage_endpoint = 'http://minio:9000/warehouse', warehouse = 'iceberg' ;
SELECT * FROM datalake.`iceberg.names`;
Output:
┌─name──┬─age─┐
1. │ David │ 22 │
└───────┴─────┘
┌─name──┬─age─┐
2. │ Alice │ 17 │
3. │ Alice │ 19 │
└───────┴─────┘
┌─name─┬─age─┐
4. │ Bob │ 20 │
└──────┴─────┘
4 rows in set. Elapsed: 0.020 sec.
But I expect to see one row as I see when reading with pyiceberg.
I expecrince the same issue when reading data with iceberg table function as well. I use following query:
SELECT * FROM iceberg('http://minio:9000/warehouse/data')