Add support for customer write.data.path and write.metadata.path with test for object store location provider#193
Conversation
|
We had to tackle the same problem (object-storage layout) in Nessie, but also wanted to retain the ability to control the table's location for data security reasons and generating IAM policies. Technically, there's sadly not only The solution in Nessie to the "object-storage layout problem" is: If |
By "right" A user should be able to set the |
|
@snazy I added an extra test verifying that object-store layout already works under the table's default location. I think this should work for anyone who really wants credential-vending to be specifically scoped to the table location, while the extra configuration allows for a less strict solution for use cases that want to trade credential vending for performance/scale |
There was a problem hiding this comment.
This is a good but I wonder if we need a way to add even more locations, the basic issue is this only represents the "latest" location that data or metadata could be written to and a user could have set a different value to these properties in the past.
There was a problem hiding this comment.
This is the purpose of the allowedLocations field of the storageConfiguration. Currently, we can't set table-level StorageConfiguration, but part of the reason for implementing the StorageConfigurationOverride is to help pave the way toward that.
There was a problem hiding this comment.
computeIfPresent May work here?
There was a problem hiding this comment.
Maybe if Map had a ifPresent(Consumer<T>) method...
polaris-service/src/main/java/io/polaris/service/catalog/BasePolarisCatalog.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Check Optional.ofNullable to simplify this ternary
There was a problem hiding this comment.
scratch that, l is a list here, ... hmm
polaris-core/src/main/java/io/polaris/core/storage/PolarisStorageConfigurationInfo.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
It looks like to avoid always doing a storage configuration override we are using the Optional path below, It just feels like it may be simpler to just always add Locs to the Config?
Or we could just do
if (userSpecifiedWriteLocations.notEmpty)
return Storage Override
else
configInfo
I'm not sure the functional style here makes it much more readable, One of those things that I think would be much cleaner in Scala.
There was a problem hiding this comment.
Hmm, yeah I think we can just always add the StorageConfigurationOverride with the empty locs if not set
RussellSpitzer
left a comment
There was a problem hiding this comment.
At a High level I think this is is a great idea, I do think we should probably reduce some of the functional approach here which I think is a little complicated for what we are trying to accomplish. If we were in Scala I feel like it would be very simple but in Java we have to do the List -> Option swap which I don't think is very easy to read.
I don't have a strong feeling about that though if other engineers want to go with this style everywhere in the project.
There are some references to "snowman" in the test file which I think we may need to change now, but I see there are other references there already so this isn't a deal breaker for me.
Also for future problems, we probably will need to consider arbitrary additional paths but i'm not sure how we can do that at the moment.
64e0624 to
0c32a7a
Compare
* Remove "giant" constructors from Handlers and Adapters (apache#3669) This PR attempts to reduce the "giant constructor" symptoms in catalog adapters and handlers. * The `CatalogHandler` subtypes are now `@PolarisImmutable` objects, and thus can leverage the generated builder to create instances without the need for big constructors. * The handlers are now produced by a new factory, which is a CDI request-scoped bean with direct field injection to avoid constructor injection. * Fix FileIO usage in PolarisRestCatalogIntegrationBase (apache#3658) Currently, all tests inheriting from `PolarisRestCatalogIntegrationBase` are forcibly using `InMemoryFileIO` for all `RESTCatalog` instances. Furthermore, this class wasn't overriding `CatalogTests.baseTableLocation()`, so tests calling this method were actually operating on `file://` paths all the time. Because of that, some child test classes, and in particular, the MiniIO-based ones, weren't actually testing anything useful. This PR changes `PolarisRestCatalogIntegrationBase` to use a configurable `FileIO` impl, defaulting to `ResolvingFileIO`. This change uncovered a few latent issues with current tests, and notably: 1) Tests using MinIO should request credential vending consistently, otherwise most tests would fail because the S3 client attempts to load credentials from the environment. 2) Tests calling `registerTable` need special handling since this endpoint does NOT support access delegation at all. For these tests, the `RESTCatalog` clients need to be given direct credentials for S3 – since the server won't give them anything – even with vended credentials requested. Incidentally, this PR simplifies `IntegrationTestsHelper.mergeFromAnnotatedElements()` – but its functionality stays the same. * Automatically adjust gradlew during Gradle Wrapper updates (apache#3620) * CI for feature branches (apache#3686) This change enables CI on branches matching the pattern `feature/*`. This change does _not_ require PRs against `feature/*` branches to have "green" CI. * fix: Correct schema version number in schema-v4.sql files (apache#3690) - Update DatabaseType.java to allow schemaVersion up to 4 - Fix h2/schema-v4.sql to set version to 4 instead of 3 - Fix postgres/schema-v4.sql to set version to 4 instead of 3 Co-authored-by: Anand Kumar Sankaran <[email protected]> * Add doc for behavior change configuration (apache#3677) * Disable python update from GH action (apache#3678) * Add support for rustfs test-containers (apache#3679) * Fix CI (apache#3698) * Fix status badge for README.md (apache#3696) * Fix README.md CI URL * Fix README.md CI URL * Update dependency io.opentelemetry:opentelemetry-bom to v1.59.0 (apache#3704) * Update dependency com.puppycrawl.tools:checkstyle to v13.2.0 (apache#3703) * Update dependency com.google.errorprone:error_prone_core to v2.47.0 (apache#3702) * Update registry.access.redhat.com/ubi9/openjdk-21-runtime Docker tag to v1.24-2.1770236038 (apache#3701) * Update dependency ch.qos.logback:logback-classic to v1.5.28 (apache#3700) * chore(deps): update dependency ipykernel to v7.2.0 (apache#3705) * chore(deps): update docker.io/jaegertracing/jaeger docker tag to v2.15.0 (apache#3706) * fix(deps): update dependency io.smallrye.common:smallrye-common-annotation to v2.16.0 (apache#3707) * Last merged commit de43d1d * Fix integration-tests after apache#3704 Short story: This change fixes the int-tests in `:polaris-runtime-service`. Long story is this: In `:polaris-runtime-service` we intentionally declare "just" a `platform(libs.quarkus.bom)` dependency, because of the `:polaris-spark-integration-*` module. It is however generally recommended by Quarkus to _only_ use `enforcedPlatform(libs.quarkus.bom)` to effectively prevent breaking changes coming from transitive dependencies. Which is exactly what happend after apache#3704. Why did CI not catch this issue? The answer is pretty simple: The effective Gradle task inputs, including the `intTestRuntimeClasspath` did not change. So the previously cached test outcomes could be reused, and the int-tests did not run. Just adding the `runtimeClasspath` as another task-input of the `intTest` may _not_ work as intended. Why does _removing_ `implementation(platform(libs.opentelemetry.bom))` help? Simply because that lets the dependencies fall back to to the declared transitive dependencies. We do not have direct dependencies to OTel. The correct fix here _would_ be to use `implementation(enforcedPlatform(libs.quarkus.bom))`, but that breaks the Spark plugin integration tests. There is a better alternative: Let the Spark plugin tests leverage polaris-apprunner, which is meant for exactly the use case of effectively decoupling some module from the build requirements of a Quarkus application. --------- Co-authored-by: Alexandre Dutra <[email protected]> Co-authored-by: Anand K Sankaran <[email protected]> Co-authored-by: Anand Kumar Sankaran <[email protected]> Co-authored-by: Yong Zheng <[email protected]> Co-authored-by: Mend Renovate <[email protected]>
Description
Custom
write.data.pathandwrite.metadata.pathare currently blocked, preventing users from following the guidelines in https://iceberg.apache.org/docs/1.5.1/aws/#object-store-file-layout . This updates the blocking to allow for customized write paths with the existingALLOW_UNSTRUCTURED_TABLE_LOCATIONconfiguration flag. This also adds a test using the object store location provider feature.Fixes #113
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
New pytest for S3
Test Configuration:
Checklist:
Please delete options that are not relevant.