-
Notifications
You must be signed in to change notification settings - Fork 3k
Closed
Labels
Milestone
Description
Apache Iceberg version
None
Query engine
Athena v3
Please describe the bug 🐞
I'm trying to read an iceberg table written by Athena (engine v3), not sure which iceberg version it uses.
When running this code:
from pyiceberg import catalog
from pyiceberg.expressions import GreaterThanOrEqual
glue_catalog = catalog.load_glue(name='glue', conf={})
glue_catalog.list_namespaces()
glue_catalog.list_tables('data_engineering')
table = glue_catalog.load_table("data_engineering.iceberg_example_1")
scan = table.scan()
files = [task.file.file_path for task in scan.plan_files()]
print(files)
df_iceberg = scan.to_pandas()
print(len(df_iceberg))
If fails on the df_iceberg = scan.to_pandas() (I tried also with scan.to_arrow().
I'm able to list all the files belonging to the table, therefore this files = [task.file.file_path for task in scan.plan_files()] works.
The error is the following:
Traceback (most recent call last):
File "/Users/nicor88/deng-swiss-knife/icerberg/get_data.py", line 31, in <module>
df_iceberg = scan.to_arrow()
File "/Users/nicor88/deng-swiss-knife/venv/lib/python3.9/site-packages/pyiceberg/table/__init__.py", line 341, in to_arrow
return project_table(
File "/Users/nicor88/deng-swiss-knife/venv/lib/python3.9/site-packages/pyiceberg/io/pyarrow.py", line 508, in project_table
schema_raw = parquet_schema.metadata.get(ICEBERG_SCHEMA)
AttributeError: 'NoneType' object has no attribute 'get'
an example table can be created like that:
create table
data_engineering.iceberg_example_1
with (
table_type='iceberg',
is_external=false,
location='s3://xxxx/iceberg_1',
partitioning=ARRAY['creation_date', 'bucket(user_id, 5)'],
format='parquet',
vacuum_max_snapshot_age_seconds=86400,
optimize_rewrite_delete_file_threshold=2
)
as
with data as (
select
1 as user_id,
'pi' as user_name,
'active' as status,
17.89 as cost,
1 as quantity,
100000000 as quantity_big,
cast(cast(from_unixtime(to_unixtime(now())) as timestamp(6)) as date) as creation_date,
cast(from_unixtime(to_unixtime(now())) as timestamp(6)) as created_at,
cast(from_unixtime(to_unixtime(now())) as timestamp(6)) as updated_at
union all
select
2 as user_id,
'beta' as user_name,
'inactive' as status,
3 as cost,
5 as quantity,
100000000 as quantity_big,
cast(cast(from_unixtime(to_unixtime(now())) as timestamp(6)) as date) as creation_date,
cast(from_unixtime(to_unixtime(now())) as timestamp(6)) as created_at,
cast(from_unixtime(to_unixtime(now())) as timestamp(6)) as updated_at
)
select
user_id,
user_name,
status,
cost,
quantity,
quantity_big,
creation_date,
created_at,
cast(from_unixtime(to_unixtime(now())) as timestamp(6)) as inserted_at
from data
Reactions are currently unavailable