Fix FileIO usage in PolarisRestCatalogIntegrationBase#3658
Fix FileIO usage in PolarisRestCatalogIntegrationBase#3658adutra merged 2 commits intoapache:mainfrom
Conversation
|
Heads up: this change may have implications for the following cloud tests:
However I cannot test them 🤷♂️ The first 3 are probably fine: credential vending is implicitly enabled, but the clients don't request any access delegation mode by default. This should work as long as the The last one, @RestCatalogConfig({"header.X-Iceberg-Access-Delegation", "vended-credentials"}) |
| } | ||
|
|
||
| @Override | ||
| protected Map<String, String> extraCatalogProperties(TestInfo testInfo) { |
There was a problem hiding this comment.
This is admittedly hacky, but I couldn't find a better way to tweak just this test's RESTCatalog without introducing unwanted S3 credentials in all other tests. Unfortunately we can't annotate the test with @RestCatalogConfig because endpoint is computed dynamically.
There was a problem hiding this comment.
Heads up: with the new register table endpoint with access delegation coming in Iceberg 1.11, this trick will become obsolete.
There was a problem hiding this comment.
The IcebergCatalogHandler.registerTable implementation just behaves differently than IcebergCatalogHandler.loadTable. Both return a LoadTableResponse, but the former doesn't vend credentials. I think, registerTable could easily do that, because the table exists when it returns?
(A change for a different PR though)
It might be good to add that annotation, yet |
| if (testInfo.getTestMethod().orElseThrow().getName().equals("testRegisterTable")) { | ||
| // This test registers a table – operation that doesn't support access delegation – | ||
| // then attempts to use the table's FileIO. This can only work if the client has its | ||
| // own S3 credentials. |
There was a problem hiding this comment.
Interesting... Is it possible that Polaris neglects to return the credentials in the registerTable response, or is it a problem in the IRC API spec?
There was a problem hiding this comment.
It's a problem in the spec: the registerTable endpoint does not declare the data-access header parameter:
I will reach out to the Iceberg ML
There was a problem hiding this comment.
Just curious, what triggers the usage of a hierarchical SAS token in the In any case, let's do this change in a follow-up PR. |
This: Vended credentials may not be used by clients (as you noted), but the server-side call path when it interacts with storage is the same as for vended credentials (i.e. it vends credentials to itself). |
Sure, but in |
I guess it's set to It's a bit convoluted, indeed 😅 |
Aaah right, I missed that! Thank you for the explanation. |
Currently, all tests inheriting from `PolarisRestCatalogIntegrationBase` are forcibly using `InMemoryFileIO` for all `RESTCatalog` instances. Furthermore, this class wasn't overriding `CatalogTests.baseTableLocation()`, so tests calling this method were actually operating on `file://` paths all the time. Because of that, some child test classes, and in particular, the MiniIO-based ones, weren't actually testing anything useful. This PR changes `PolarisRestCatalogIntegrationBase` to use a configurable `FileIO` impl, defaulting to `ResolvingFileIO`. This change uncovered a few latent issues with current tests, and notably: 1) Tests using MinIO should request credential vending consistently, otherwise most tests would fail because the S3 client attempts to load credentials from the environment. 2) Tests calling `registerTable` need special handling since this endpoint does NOT support access delegation at all. For these tests, the `RESTCatalog` clients need to be given direct credentials for S3 – since the server won't give them anything – even with vended credentials requested. Incidentally, this PR simplifies `IntegrationTestsHelper.mergeFromAnnotatedElements()` – but its functionality stays the same.
90327cd to
ddedf88
Compare
|
@dimas-b as a follow up to your explanation, I went ahead and added Can you have another look please? |
snazy
left a comment
There was a problem hiding this comment.
The change LGTM.
Thanks for fixing the tests!
I can live with a hack in tests, but I think we can get vended credentials from a register-table call even without waiting for the new spec. Polaris can behave as if the header is absent - or just evaluate the header as load-table does.
| } | ||
|
|
||
| @Override | ||
| protected Map<String, String> extraCatalogProperties(TestInfo testInfo) { |
There was a problem hiding this comment.
The IcebergCatalogHandler.registerTable implementation just behaves differently than IcebergCatalogHandler.loadTable. Both return a LoadTableResponse, but the former doesn't vend credentials. I think, registerTable could easily do that, because the table exists when it returns?
(A change for a different PR though)
Currently, all tests inheriting from `PolarisRestCatalogIntegrationBase` are forcibly using `InMemoryFileIO` for all `RESTCatalog` instances. Furthermore, this class wasn't overriding `CatalogTests.baseTableLocation()`, so tests calling this method were actually operating on `file://` paths all the time. Because of that, some child test classes, and in particular, the MiniIO-based ones, weren't actually testing anything useful. This PR changes `PolarisRestCatalogIntegrationBase` to use a configurable `FileIO` impl, defaulting to `ResolvingFileIO`. This change uncovered a few latent issues with current tests, and notably: 1) Tests using MinIO should request credential vending consistently, otherwise most tests would fail because the S3 client attempts to load credentials from the environment. 2) Tests calling `registerTable` need special handling since this endpoint does NOT support access delegation at all. For these tests, the `RESTCatalog` clients need to be given direct credentials for S3 – since the server won't give them anything – even with vended credentials requested. Incidentally, this PR simplifies `IntegrationTestsHelper.mergeFromAnnotatedElements()` – but its functionality stays the same.
* Remove "giant" constructors from Handlers and Adapters (apache#3669) This PR attempts to reduce the "giant constructor" symptoms in catalog adapters and handlers. * The `CatalogHandler` subtypes are now `@PolarisImmutable` objects, and thus can leverage the generated builder to create instances without the need for big constructors. * The handlers are now produced by a new factory, which is a CDI request-scoped bean with direct field injection to avoid constructor injection. * Fix FileIO usage in PolarisRestCatalogIntegrationBase (apache#3658) Currently, all tests inheriting from `PolarisRestCatalogIntegrationBase` are forcibly using `InMemoryFileIO` for all `RESTCatalog` instances. Furthermore, this class wasn't overriding `CatalogTests.baseTableLocation()`, so tests calling this method were actually operating on `file://` paths all the time. Because of that, some child test classes, and in particular, the MiniIO-based ones, weren't actually testing anything useful. This PR changes `PolarisRestCatalogIntegrationBase` to use a configurable `FileIO` impl, defaulting to `ResolvingFileIO`. This change uncovered a few latent issues with current tests, and notably: 1) Tests using MinIO should request credential vending consistently, otherwise most tests would fail because the S3 client attempts to load credentials from the environment. 2) Tests calling `registerTable` need special handling since this endpoint does NOT support access delegation at all. For these tests, the `RESTCatalog` clients need to be given direct credentials for S3 – since the server won't give them anything – even with vended credentials requested. Incidentally, this PR simplifies `IntegrationTestsHelper.mergeFromAnnotatedElements()` – but its functionality stays the same. * Automatically adjust gradlew during Gradle Wrapper updates (apache#3620) * CI for feature branches (apache#3686) This change enables CI on branches matching the pattern `feature/*`. This change does _not_ require PRs against `feature/*` branches to have "green" CI. * fix: Correct schema version number in schema-v4.sql files (apache#3690) - Update DatabaseType.java to allow schemaVersion up to 4 - Fix h2/schema-v4.sql to set version to 4 instead of 3 - Fix postgres/schema-v4.sql to set version to 4 instead of 3 Co-authored-by: Anand Kumar Sankaran <[email protected]> * Add doc for behavior change configuration (apache#3677) * Disable python update from GH action (apache#3678) * Add support for rustfs test-containers (apache#3679) * Fix CI (apache#3698) * Fix status badge for README.md (apache#3696) * Fix README.md CI URL * Fix README.md CI URL * Update dependency io.opentelemetry:opentelemetry-bom to v1.59.0 (apache#3704) * Update dependency com.puppycrawl.tools:checkstyle to v13.2.0 (apache#3703) * Update dependency com.google.errorprone:error_prone_core to v2.47.0 (apache#3702) * Update registry.access.redhat.com/ubi9/openjdk-21-runtime Docker tag to v1.24-2.1770236038 (apache#3701) * Update dependency ch.qos.logback:logback-classic to v1.5.28 (apache#3700) * chore(deps): update dependency ipykernel to v7.2.0 (apache#3705) * chore(deps): update docker.io/jaegertracing/jaeger docker tag to v2.15.0 (apache#3706) * fix(deps): update dependency io.smallrye.common:smallrye-common-annotation to v2.16.0 (apache#3707) * Last merged commit de43d1d * Fix integration-tests after apache#3704 Short story: This change fixes the int-tests in `:polaris-runtime-service`. Long story is this: In `:polaris-runtime-service` we intentionally declare "just" a `platform(libs.quarkus.bom)` dependency, because of the `:polaris-spark-integration-*` module. It is however generally recommended by Quarkus to _only_ use `enforcedPlatform(libs.quarkus.bom)` to effectively prevent breaking changes coming from transitive dependencies. Which is exactly what happend after apache#3704. Why did CI not catch this issue? The answer is pretty simple: The effective Gradle task inputs, including the `intTestRuntimeClasspath` did not change. So the previously cached test outcomes could be reused, and the int-tests did not run. Just adding the `runtimeClasspath` as another task-input of the `intTest` may _not_ work as intended. Why does _removing_ `implementation(platform(libs.opentelemetry.bom))` help? Simply because that lets the dependencies fall back to to the declared transitive dependencies. We do not have direct dependencies to OTel. The correct fix here _would_ be to use `implementation(enforcedPlatform(libs.quarkus.bom))`, but that breaks the Spark plugin integration tests. There is a better alternative: Let the Spark plugin tests leverage polaris-apprunner, which is meant for exactly the use case of effectively decoupling some module from the build requirements of a Quarkus application. --------- Co-authored-by: Alexandre Dutra <[email protected]> Co-authored-by: Anand K Sankaran <[email protected]> Co-authored-by: Anand Kumar Sankaran <[email protected]> Co-authored-by: Yong Zheng <[email protected]> Co-authored-by: Mend Renovate <[email protected]>
Currently, all tests inheriting from
PolarisRestCatalogIntegrationBaseare forcibly usingInMemoryFileIOfor allRESTCataloginstances.Furthermore, this class wasn't overriding
CatalogTests.baseTableLocation(), so tests calling this method were actually operating onfile://paths all the time.Because of that, some child test classes, and in particular, the MiniIO-based ones, weren't actually testing anything useful.
This PR changes
PolarisRestCatalogIntegrationBaseto use a configurableFileIOimpl, defaulting toResolvingFileIO.This change uncovered a few latent issues with current tests, and notably:
Tests using MinIO should request credential vending consistently, otherwise most tests would fail because the S3 client attempts to load credentials from the environment.
Tests calling
registerTableneed special handling since this endpoint does NOT support access delegation at all. For these tests, theRESTCatalogclients need to be given direct credentials for S3 – since the server won't give them anything – even with vended credentials requested.Incidentally, this PR simplifies
IntegrationTestsHelper.mergeFromAnnotatedElements()– but its functionality stays the same.Checklist
CHANGELOG.md(if needed)site/content/in-dev/unreleased(if needed)