Skip to content

Fix FileIO usage in PolarisRestCatalogIntegrationBase#3658

Merged
adutra merged 2 commits intoapache:mainfrom
adutra:fix-file-io-integration-tests
Feb 6, 2026
Merged

Fix FileIO usage in PolarisRestCatalogIntegrationBase#3658
adutra merged 2 commits intoapache:mainfrom
adutra:fix-file-io-integration-tests

Conversation

@adutra
Copy link
Contributor

@adutra adutra commented Feb 3, 2026

Currently, all tests inheriting from PolarisRestCatalogIntegrationBase are forcibly using InMemoryFileIO for all RESTCatalog instances.

Furthermore, this class wasn't overriding CatalogTests.baseTableLocation(), so tests calling this method were actually operating on file:// paths all the time.

Because of that, some child test classes, and in particular, the MiniIO-based ones, weren't actually testing anything useful.

This PR changes PolarisRestCatalogIntegrationBase to use a configurable FileIO impl, defaulting to ResolvingFileIO.

This change uncovered a few latent issues with current tests, and notably:

  1. Tests using MinIO should request credential vending consistently, otherwise most tests would fail because the S3 client attempts to load credentials from the environment.

  2. Tests calling registerTable need special handling since this endpoint does NOT support access delegation at all. For these tests, the RESTCatalog clients need to be given direct credentials for S3 – since the server won't give them anything – even with vended credentials requested.

Incidentally, this PR simplifies IntegrationTestsHelper.mergeFromAnnotatedElements() – but its functionality stays the same.

Checklist

  • 🛡️ Don't disclose security issues! (contact [email protected])
  • 🔗 Clearly explained why the changes are needed, or linked related issues: Fixes #
  • 🧪 Added/updated tests with good coverage, or manually tested (and explained how)
  • 💡 Added comments for complex logic
  • 🧾 Updated CHANGELOG.md (if needed)
  • 📚 Updated documentation in site/content/in-dev/unreleased (if needed)

@adutra
Copy link
Contributor Author

adutra commented Feb 3, 2026

Heads up: this change may have implications for the following cloud tests:

  • RestCatalogS3IT
  • RestCatalogGcsIT
  • RestCatalogAdlsIT
  • RestCatalogAdlsCredentialVendingIT

However I cannot test them 🤷‍♂️

The first 3 are probably fine: credential vending is implicitly enabled, but the clients don't request any access delegation mode by default. This should work as long as the RESTCatalog clients have access to their own storage credentials via the environment (which seems to be the intent).

The last one, RestCatalogAdlsCredentialVendingIT, explicitly enables credentials vending, but in fact, behaves exactly like RestCatalogAdlsIT, i.e., the RESTCatalog clients are not configured to request any delegation mode. Imho this test is lacking the annotation:

@RestCatalogConfig({"header.X-Iceberg-Access-Delegation", "vended-credentials"})

}

@Override
protected Map<String, String> extraCatalogProperties(TestInfo testInfo) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is admittedly hacky, but I couldn't find a better way to tweak just this test's RESTCatalog without introducing unwanted S3 credentials in all other tests. Unfortunately we can't annotate the test with @RestCatalogConfig because endpoint is computed dynamically.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Heads up: with the new register table endpoint with access delegation coming in Iceberg 1.11, this trick will become obsolete.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The IcebergCatalogHandler.registerTable implementation just behaves differently than IcebergCatalogHandler.loadTable. Both return a LoadTableResponse, but the former doesn't vend credentials. I think, registerTable could easily do that, because the table exists when it returns?
(A change for a different PR though)

@dimas-b
Copy link
Contributor

dimas-b commented Feb 3, 2026

The last one, RestCatalogAdlsCredentialVendingIT, explicitly enables credentials vending, but in fact, behaves exactly like RestCatalogAdlsIT, i.e., the RESTCatalog clients are not configured to request any delegation mode. Imho this test is lacking the annotation: [...] @RestCatalogConfig({"header.X-Iceberg-Access-Delegation", "vended-credentials"})

It might be good to add that annotation, yet RestCatalogAdlsCredentialVendingIT behaves differently from RestCatalogAdlsIT. I added the former specifically to test hierarchical SAS tokens in ADLS. The latter test does not exercise storage integration code at all, but the former does.

dimas-b
dimas-b previously approved these changes Feb 3, 2026
Copy link
Contributor

@dimas-b dimas-b left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for finding and fixing this problem, @adutra ! The change seems worth merging even with the hack in test code 👍

Comment on lines +101 to +104
if (testInfo.getTestMethod().orElseThrow().getName().equals("testRegisterTable")) {
// This test registers a table – operation that doesn't support access delegation –
// then attempts to use the table's FileIO. This can only work if the client has its
// own S3 credentials.
Copy link
Contributor

@dimas-b dimas-b Feb 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting... Is it possible that Polaris neglects to return the credentials in the registerTable response, or is it a problem in the IRC API spec?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a problem in the spec: the registerTable endpoint does not declare the data-access header parameter:

https://github.com/apache/iceberg/blob/d0654440a4cd00feaff73c53cd3ef9fb787b92dc/open-api/rest-catalog-open-api.yaml#L883-L898

I will reach out to the Iceberg ML

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@github-project-automation github-project-automation bot moved this from PRs In Progress to Ready to merge in Basic Kanban Board Feb 3, 2026
@adutra
Copy link
Contributor Author

adutra commented Feb 3, 2026

It might be good to add that annotation, yet RestCatalogAdlsCredentialVendingIT behaves differently from RestCatalogAdlsIT. I added the former specifically to test hierarchical SAS tokens in ADLS. The latter test does not exercise storage integration code at all, but the former does.

Just curious, what triggers the usage of a hierarchical SAS token in the RestCatalogAdlsCredentialVendingIT test? As far as I can tell, both tests are identical since SKIP_CREDENTIAL_SUBSCOPING_INDIRECTION is false by default.

In any case, let's do this change in a follow-up PR.

@dimas-b
Copy link
Contributor

dimas-b commented Feb 3, 2026

Just curious, what triggers the usage of a hierarchical SAS token in the RestCatalogAdlsCredentialVendingIT test?

This:

.put("polaris.features.\"SKIP_CREDENTIAL_SUBSCOPING_INDIRECTION\"", "false")

Vended credentials may not be used by clients (as you noted), but the server-side call path when it interacts with storage is the same as for vended credentials (i.e. it vends credentials to itself).

@adutra
Copy link
Contributor Author

adutra commented Feb 4, 2026

the server-side call path when it interacts with storage is the same as for vended credentials (i.e. it vends credentials to itself).

Sure, but in RestCatalogAdlsIT too the server-side credential vending is enabled, since SKIP_CREDENTIAL_SUBSCOPING_INDIRECTION is false by default. I'm not seeing any difference between these two tests 😅

@dimas-b
Copy link
Contributor

dimas-b commented Feb 4, 2026

SKIP_CREDENTIAL_SUBSCOPING_INDIRECTION is false by default [...]

I guess it's set to true for integration tests here:

polaris.features."SKIP_CREDENTIAL_SUBSCOPING_INDIRECTION"=true

It's a bit convoluted, indeed 😅

@adutra
Copy link
Contributor Author

adutra commented Feb 5, 2026

It's a bit convoluted, indeed 😅

Aaah right, I missed that! Thank you for the explanation.

Currently, all tests inheriting from `PolarisRestCatalogIntegrationBase` are forcibly using `InMemoryFileIO` for all `RESTCatalog` instances.

Furthermore, this class wasn't overriding `CatalogTests.baseTableLocation()`, so tests calling this method were actually operating on `file://` paths all the time.

Because of that, some child test classes, and in particular, the MiniIO-based ones, weren't actually testing anything useful.

This PR changes `PolarisRestCatalogIntegrationBase` to use a configurable `FileIO` impl, defaulting to `ResolvingFileIO`.

This change uncovered a few latent issues with current tests, and notably:

1) Tests using MinIO should request credential vending consistently, otherwise most tests would fail because the S3 client attempts to load credentials from the environment.

2) Tests calling `registerTable` need special handling since this endpoint does NOT support access delegation at all. For these tests, the `RESTCatalog` clients need to be given direct credentials for S3 – since the server won't give them anything – even with vended credentials requested.

Incidentally, this PR simplifies `IntegrationTestsHelper.mergeFromAnnotatedElements()` – but its functionality stays the same.
@adutra
Copy link
Contributor Author

adutra commented Feb 5, 2026

@dimas-b as a follow up to your explanation, I went ahead and added @RestCatalogConfig({"header.X-Iceberg-Access-Delegation", "vended-credentials"}) to RestCatalogAdlsCredentialVendingIT as it most likely required for this test to pass.

Can you have another look please?

Copy link
Member

@snazy snazy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change LGTM.
Thanks for fixing the tests!

I can live with a hack in tests, but I think we can get vended credentials from a register-table call even without waiting for the new spec. Polaris can behave as if the header is absent - or just evaluate the header as load-table does.

}

@Override
protected Map<String, String> extraCatalogProperties(TestInfo testInfo) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The IcebergCatalogHandler.registerTable implementation just behaves differently than IcebergCatalogHandler.loadTable. Both return a LoadTableResponse, but the former doesn't vend credentials. I think, registerTable could easily do that, because the table exists when it returns?
(A change for a different PR though)

@adutra adutra merged commit 26a394e into apache:main Feb 6, 2026
15 checks passed
@adutra adutra deleted the fix-file-io-integration-tests branch February 6, 2026 12:35
@github-project-automation github-project-automation bot moved this from Ready to merge to Done in Basic Kanban Board Feb 6, 2026
sungwy pushed a commit to sungwy/polaris that referenced this pull request Feb 7, 2026
Currently, all tests inheriting from `PolarisRestCatalogIntegrationBase` are forcibly using `InMemoryFileIO` for all `RESTCatalog` instances.

Furthermore, this class wasn't overriding `CatalogTests.baseTableLocation()`, so tests calling this method were actually operating on `file://` paths all the time.

Because of that, some child test classes, and in particular, the MiniIO-based ones, weren't actually testing anything useful.

This PR changes `PolarisRestCatalogIntegrationBase` to use a configurable `FileIO` impl, defaulting to `ResolvingFileIO`.

This change uncovered a few latent issues with current tests, and notably:

1) Tests using MinIO should request credential vending consistently, otherwise most tests would fail because the S3 client attempts to load credentials from the environment.

2) Tests calling `registerTable` need special handling since this endpoint does NOT support access delegation at all. For these tests, the `RESTCatalog` clients need to be given direct credentials for S3 – since the server won't give them anything – even with vended credentials requested.

Incidentally, this PR simplifies `IntegrationTestsHelper.mergeFromAnnotatedElements()` – but its functionality stays the same.
@MonkeyCanCode MonkeyCanCode mentioned this pull request Feb 7, 2026
6 tasks
snazy added a commit to snazy/polaris that referenced this pull request Feb 11, 2026
* Remove "giant" constructors from Handlers and Adapters (apache#3669)

This PR attempts to reduce the "giant constructor" symptoms in catalog adapters and handlers.
* The `CatalogHandler` subtypes are now `@PolarisImmutable` objects, and thus can leverage the generated builder to create instances without the need for big constructors.
* The handlers are now produced by a new factory, which is a CDI request-scoped bean with direct field injection to avoid constructor injection.

* Fix FileIO usage in PolarisRestCatalogIntegrationBase (apache#3658)

Currently, all tests inheriting from `PolarisRestCatalogIntegrationBase` are forcibly using `InMemoryFileIO` for all `RESTCatalog` instances.

Furthermore, this class wasn't overriding `CatalogTests.baseTableLocation()`, so tests calling this method were actually operating on `file://` paths all the time.

Because of that, some child test classes, and in particular, the MiniIO-based ones, weren't actually testing anything useful.

This PR changes `PolarisRestCatalogIntegrationBase` to use a configurable `FileIO` impl, defaulting to `ResolvingFileIO`.

This change uncovered a few latent issues with current tests, and notably:

1) Tests using MinIO should request credential vending consistently, otherwise most tests would fail because the S3 client attempts to load credentials from the environment.

2) Tests calling `registerTable` need special handling since this endpoint does NOT support access delegation at all. For these tests, the `RESTCatalog` clients need to be given direct credentials for S3 – since the server won't give them anything – even with vended credentials requested.

Incidentally, this PR simplifies `IntegrationTestsHelper.mergeFromAnnotatedElements()` – but its functionality stays the same.

* Automatically adjust gradlew during Gradle Wrapper updates (apache#3620)

* CI for feature branches (apache#3686)

This change enables CI on branches matching the pattern `feature/*`.

This change does _not_ require PRs against `feature/*` branches to have "green" CI.

* fix: Correct schema version number in schema-v4.sql files (apache#3690)

- Update DatabaseType.java to allow schemaVersion up to 4
- Fix h2/schema-v4.sql to set version to 4 instead of 3
- Fix postgres/schema-v4.sql to set version to 4 instead of 3

Co-authored-by: Anand Kumar Sankaran <[email protected]>

* Add doc for behavior change configuration (apache#3677)

* Disable python update from GH action (apache#3678)

* Add support for rustfs test-containers (apache#3679)

* Fix CI (apache#3698)

* Fix status badge for README.md (apache#3696)

* Fix README.md CI URL

* Fix README.md CI URL

* Update dependency io.opentelemetry:opentelemetry-bom to v1.59.0 (apache#3704)

* Update dependency com.puppycrawl.tools:checkstyle to v13.2.0 (apache#3703)

* Update dependency com.google.errorprone:error_prone_core to v2.47.0 (apache#3702)

* Update registry.access.redhat.com/ubi9/openjdk-21-runtime Docker tag to v1.24-2.1770236038 (apache#3701)

* Update dependency ch.qos.logback:logback-classic to v1.5.28 (apache#3700)

* chore(deps): update dependency ipykernel to v7.2.0 (apache#3705)

* chore(deps): update docker.io/jaegertracing/jaeger docker tag to v2.15.0 (apache#3706)

* fix(deps): update dependency io.smallrye.common:smallrye-common-annotation to v2.16.0 (apache#3707)

* Last merged commit de43d1d

* Fix integration-tests after apache#3704

Short story: This change fixes the int-tests in `:polaris-runtime-service`.

Long story is this:
In `:polaris-runtime-service` we intentionally declare "just" a `platform(libs.quarkus.bom)` dependency, because of the `:polaris-spark-integration-*` module.
It is however generally recommended by Quarkus to _only_ use `enforcedPlatform(libs.quarkus.bom)` to effectively prevent breaking changes coming from transitive dependencies. Which is exactly what happend after apache#3704.

Why did CI not catch this issue?
The answer is pretty simple: The effective Gradle task inputs, including the `intTestRuntimeClasspath` did not change. So the previously cached test outcomes could be reused, and the int-tests did not run.
Just adding the `runtimeClasspath` as another task-input of the `intTest` may _not_ work as intended.

Why does _removing_ `implementation(platform(libs.opentelemetry.bom))` help?
Simply because that lets the dependencies fall back to to the declared transitive dependencies. We do not have direct dependencies to OTel.

The correct fix here _would_ be to use `implementation(enforcedPlatform(libs.quarkus.bom))`, but that breaks the Spark plugin integration tests.

There is a better alternative: Let the Spark plugin tests leverage polaris-apprunner, which is meant for exactly the use case of effectively decoupling some module from the build requirements of a Quarkus application.

---------

Co-authored-by: Alexandre Dutra <[email protected]>
Co-authored-by: Anand K Sankaran <[email protected]>
Co-authored-by: Anand Kumar Sankaran <[email protected]>
Co-authored-by: Yong Zheng <[email protected]>
Co-authored-by: Mend Renovate <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants