Skip to content

[Bug]: Nessie GC is not deleting files from S3 bucket after GC/delete command (expiry and orphan file clean up) #9042

@schobe

Description

@schobe

What happened

Hi team,

We are trying to expire the old snapshots and delete the orphan files using the Nessie GC feature. Iceberg tables are created on top of s3 bucket, so after nessie gc we are expecting the expired files (data as well as metadata) to be deleted from the s3. However, the physical files are not deleted even after trying gc as well as explicit delete commands. Furthermore, we encountered following error while running nessie gc.

_Caused by: java.lang.RuntimeException: Failed to get paths from manifest file location.avro
at org.projectnessie.gc.iceberg.IcebergContentToFiles.allDataAndDeleteFiles(IcebergContentToFiles.java:225)
at org.projectnessie.gc.iceberg.IcebergContentToFiles.lambda$allManifestsAndDataFiles$2(IcebergContentToFiles.java:202)

Caused by: java.lang.IllegalArgumentException: Cannot parse partition spec fields, not an array: {"spec-id":0,"fields":[]}_

Can you please help us to understand this issue?

How to reproduce it

  1. Create Iceberg table on s3 using Iceberg rest via Nessie (hosted via docker container)
  2. Add some records in the table.
  3. Get current snapshot id.
  4. Delete records from the table
  5. Get current snapshot id.
  6. Run the following docker container to run the nessie GC.
 nessie-gc:
    image: ghcr.io/projectnessie/nessie-gc:0.91.2
    ports:
      - "5435:5435"
    depends_on:
      nessie:
        condition: service_healthy
    command: gc --uri http://nessie:19120/api/v2 --iceberg=s3.access-key-id=${S3_KEY} --iceberg=s3.secret-access-key=${S3_SECRET} --jdbc --jdbc-url=${JDBC_URL} --jdbc-user=${JDBC_USERNAME} --jdbc-password=${JDBC_PASSWORD}
  1. Check the gc log, docker logs container-id
  2. Check the postgres tables if the references and live sets are deleted or not.
  3. Check the s3 location if the expired or files marked as delete are deleted or not.

Nessie server type (docker/uber-jar/built from source) and version

ghcr.io/projectnessie/nessie:0.90.4

Client type (Ex: UI/Spark/pynessie ...) and version

No response

Additional information

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions