[Issue 3209] Surface Intermediate Data as part of Polaris Events#3456
[Issue 3209] Surface Intermediate Data as part of Polaris Events#3456HonahX merged 10 commits intoapache:mainfrom
Conversation
evindj
left a comment
There was a problem hiding this comment.
overall looks good to me, I left some questions mostly for my own understanding of the commit and event flow.
...java/org/apache/polaris/service/catalog/iceberg/IcebergRestCatalogEventServiceDelegator.java
Show resolved
Hide resolved
...ice/src/test/java/org/apache/polaris/service/catalog/iceberg/CommitTransactionEventTest.java
Show resolved
Hide resolved
...ice/src/test/java/org/apache/polaris/service/catalog/iceberg/CommitTransactionEventTest.java
Show resolved
Hide resolved
dimas-b
left a comment
There was a problem hiding this comment.
Thanks, @adnanhemani ! The general direction of this PR LGTM... some comments below.
.../service/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalogHandler.java
Outdated
Show resolved
Hide resolved
runtime/service/src/main/java/org/apache/polaris/service/events/EventAttributeMap.java
Show resolved
Hide resolved
runtime/service/src/main/java/org/apache/polaris/service/events/EventAttributeMap.java
Show resolved
Hide resolved
.../service/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalogHandler.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Shouldn't we change all these new AttributeMap() occurrences to use the injected attribute map instead?
There was a problem hiding this comment.
I had a different thought - but would like to hear your thoughts on this as well.
I was using AttributeMap to inject into the deeper parts of the codebase - but then using that to populate the final AttributeMap that gets passed to the Event Listener. The chief concern was to ensure that the After event attributes don't accidentally become strictly a super-set of the Before events attributes - and that cleaner attribute maps going into the event listener may help with filtering concerns that we will need to tackle in order to enable all events going to their eventual sink.
WDYT?
There was a problem hiding this comment.
EventAttributeMap is a proper CDI bean now. It should be ok to rely on CDI to create and pass around (inject) one instance of it per request.
If we want to distinguish before/after data, we could do that by code / methods inside EventAttributeMap. As I commented in another thread, I see value in evolving this class to become a rich and typed container of event-related data for each request (more than a simple key-value map).
There was a problem hiding this comment.
@dimas-b, if you're okay with it, let's handle this in a separate PR then? I think we may need to iterate on this idea a few times before we get it right - but this PR is attempting to do something very related but slightly orthogonal and IMO enough for a first take at this functionality.
There was a problem hiding this comment.
I do not mind incremental work here, but please sync with @adutra in terms of timing.
...java/org/apache/polaris/service/catalog/iceberg/IcebergRestCatalogEventServiceDelegator.java
Show resolved
Hide resolved
.../service/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalogHandler.java
Outdated
Show resolved
Hide resolved
runtime/service/src/main/java/org/apache/polaris/service/events/EventAttributes.java
Outdated
Show resolved
Hide resolved
adutra
left a comment
There was a problem hiding this comment.
The PR is almost good to go imho, we just need to settle on the LOAD_TABLE_RESPONSES attribute.
.../service/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalogHandler.java
Outdated
Show resolved
Hide resolved
478e8df to
344a434
Compare
runtime/service/src/main/java/org/apache/polaris/service/events/EventAttributes.java
Outdated
Show resolved
Hide resolved
| public static final AttributeKey<List<TableMetadata>> TABLE_METADATA = | ||
| new AttributeKey<>("table_metadatas", new TypeToken<List<TableMetadata>>() {}); |
There was a problem hiding this comment.
The field name should reflect the attribute name: TABLE_METADATAS.
There was a problem hiding this comment.
There seems to be an inconsistency here: sometimes this attribute seems to be legitimately collecting a list of metadatas, cf. IcebergCatalogHandler. But sometimes, it's collecting a singleton list of a unique table metadata, cf. IcebergCatalogEventServiceDelegator.
I think it would be better to distinguish two attributes: TABLE_METADATA and TABLE_METADATAS. (I understand that the latter may become a sort of "internal" attribute that is never exposed to listeners – we could maybe declare those separately if we are concerned about that.)
There was a problem hiding this comment.
Sure, I'm glad to do this. I was trying to make everything fit within TABLE_METADATAS due to the similar comment that Dmitri had pointed out above. But I will make a comment for this for now so we can move past and revisit if there is a better way to do this later.
runtime/service/src/main/java/org/apache/polaris/service/events/EventAttributes.java
Outdated
Show resolved
Hide resolved
...java/org/apache/polaris/service/catalog/iceberg/IcebergRestCatalogEventServiceDelegator.java
Outdated
Show resolved
Hide resolved
dimas-b
left a comment
There was a problem hiding this comment.
Thanks, @adnanhemani ! The PR LGTM 👍
HonahX
left a comment
There was a problem hiding this comment.
LGTM!
Thanks @adnanhemani for the work and @evindj @dimas-b @adutra for reviewing! Looks like all the comments are either resolved or planned to resolve incrementally. Merging.
…che#3456) Adds a RequestScoped annotation to AttributeMap, which allows us to inject this bean into intermediary business logic and be able to patch data from there back out to the PolarisEvent. @dimas-b had a great idea that we may be able to retire the Delegator model eventually by setting a "closing" hook on the RequestScoped bean - but for simplicity, I am introducing this smaller change that will solve the GH issue linked first and will investigate that approach later.
* Nit: fix shell-check warnings for await-s3.sh (apache#3593) * Nit: add upgrade notes for develocity plugin (apache#3599) * fix(deps): update dependency org.apache.hive:hive-exec to v2.3.10 (apache#3580) * fix(deps): update dependency com.google.cloud:google-cloud-storage-bom to v2.62.1 (apache#3600) * Change quick start example from MinIO to RustFS backend (apache#3586) * chore(deps): update gradle/actions digest to f29f5a9 (apache#3602) * Simplify getting started example for Ceph (apache#3587) * Simplify getting started example for Ceph * Simplify getting started example for Ceph * fix(deps): update dependency io.smallrye.config:smallrye-config-core to v3.16.0 (apache#3604) * chore(deps): update docker.io/mongo docker tag to v8.2.4 (apache#3606) * fix(deps): update dependency net.ltgt.gradle:gradle-errorprone-plugin to v5 (apache#3605) * chore(deps): update plugin com.gradle.develocity to v4.3.2 (apache#3598) * chore(deps): update localstack/localstack docker tag to v4.13 (apache#3609) * Guides: use `sql` code block (apache#3590) Replaces the "anonymous" code blocks with `sql` code blocks, which is easier to "copy + paste" for users. * Spark guide: document the launch command in the Markdown (apache#3592) Replaces the "invisible" launch-docker.sh script with a "visible" `shell` code block, so the commands are immediately visible to users. * Minio guide: fix service dependencies (apache#3594) Use the "long" options for compose service dependencies, and adds the explicit option `restart: no` option for setup tasks, which is necessary to let docker-compose correctly interpret the termination of such tasks. * Ozone guide: Fix service dependencies (apache#3595) Use the "long" options for compose service dependencies, add healthchecks, uses more healthcheck-retries for slow-ish machines, and adds the explicit option `restart: no` option for setup tasks, which is necessary to let docker-compose correctly interpret the termination of such tasks. * Rustfs guide: fix service dependencies (apache#3596) Just adds a necessary service dependency * Telemetry guide: fix setup-polaris service (apache#3597) Fixes the polaris-setup service to use the "working" way/syntax, in alignment to other guides. * Site-workflow: add `workflow_call` trigger (apache#3520) This trigger allows the Hugo Site publication workflow to be called from the `versioned-docs` branch. This change also introduces a concurrency group to prevent concurrent site publications. Contributed to apache#3516 Co-authored-by: Dmitri Bourlatchkov <[email protected]> * [Issue 3209] Surface Intermediate Data as part of Polaris Events (apache#3456) Adds a RequestScoped annotation to AttributeMap, which allows us to inject this bean into intermediary business logic and be able to patch data from there back out to the PolarisEvent. @dimas-b had a great idea that we may be able to retire the Delegator model eventually by setting a "closing" hook on the RequestScoped bean - but for simplicity, I am introducing this smaller change that will solve the GH issue linked first and will investigate that approach later. * Getting-Started: Jaeger upgraded from V1 to V2 (apache#3603) * Jaeger V2 * Jaeger V2 * Update Gradle to 9.3.1 (apache#3615) * Last merged commit 90ba09c --------- Co-authored-by: Mend Renovate <[email protected]> Co-authored-by: Yong Zheng <[email protected]> Co-authored-by: Dmitri Bourlatchkov <[email protected]> Co-authored-by: Adnan Hemani <[email protected]>
Adds a RequestScoped annotation to
AttributeMap, which allows us to inject this bean into intermediary business logic and be able to patch data from there back out to thePolarisEvent.@dimas-b had a great idea that we may be able to retire the Delegator model eventually by setting a "closing" hook on the RequestScoped bean - but for simplicity, I am introducing this smaller change that will solve the GH issue linked first and will investigate that approach later.
Closes #3209.
Checklist
CHANGELOG.md(if needed)site/content/in-dev/unreleased(if needed)