Skip to content

Flatten events hierarchy#3293

Merged
adutra merged 10 commits intoapache:mainfrom
olsoloviov:feat/flatten-events-hierarchy
Jan 13, 2026
Merged

Flatten events hierarchy#3293
adutra merged 10 commits intoapache:mainfrom
olsoloviov:feat/flatten-events-hierarchy

Conversation

@olsoloviov
Copy link
Contributor

Flatten Polaris events structure to have a single PolarisEvent implementation with enum type and custom attributes.
This is an implementation of @adutra proposal: https://lists.apache.org/thread/xonxwf9b38t9cxo841r0hn1b34plf7og

Checklist

  • 🛡️ Don't disclose security issues! (contact [email protected])
  • 🔗 Clearly explained why the changes are needed, or linked related issues: Fixes #
  • 🧪 Added/updated tests with good coverage, or manually tested (and explained how)
  • 💡 Added comments for complex logic
  • 🧾 Updated CHANGELOG.md (if needed)
  • 📚 Updated documentation in site/content/in-dev/unreleased (if needed)

Copy link
Contributor

@adutra adutra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice work @olsoloviov, that is very much what I had in mind. Left a few minor comments Thanks!

@SuppressWarnings("unchecked")
public static final AttributeKey<Map<String, String>> NAMESPACE_PROPERTIES =
(AttributeKey<Map<String, String>>)
(AttributeKey<?>) AttributeKey.of("namespace_properties", Map.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering: in order to increase portability, shouldn't we restrict the types of attributes that we can put in this map? An attribute of type Map<String, String> is still OK, I guess, but what if the attribute doesn't have a clear serialized format, e.g. Optional or Function? Restricting to types that can be safely serialized to Json (and maybe gRPC) would imho make things easier for listener implementors.

Copy link
Contributor Author

@olsoloviov olsoloviov Dec 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't it be handled during serialization phase? We will have pruning anyway. Also, not sure if it is a desirable scenario, but some non-serializable helper context could be passed with an event.
If you strongly prefer to restrict the types, we could whitelist the types during attributes creation, wdyt @adutra?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Admittedly I haven't thought through all the ramifications, but initially I agree with @adutra on this one. Maybe we just keep the bar that whatever the object is, it must be implementing Serializable?

Personally, I'd rather have this enforced at the time of the event generation than having to deal with it later in the Event lifecycle - but not a strong opinion.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a white list for allowed attributes types

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While widely adopted in the Hadoop ecosystem, Serializable is generally viewed as a legacy feature in Java. Leveraging this marker interface here imho wouldn't achieve our core objective, which is to ensure the attributes map can be easily serialized using common wire formats, particularly JSON. For example, many objects that we want to allow aren't Serializable, e.g. the request and response types.

All of this to say: if my suggestion proves too difficult to implement, I am fine if we don't enforce anything.

}

protected abstract void processEvent(String realmId, PolarisEvent event);
private <T> T getRequiredAttribute(PolarisEvent event, AttributeKey<T> key) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be declared in PolarisEvent, I think it's going to be useful for many listeners, not just this one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to PolarisEvent, also switched to throw instead of logging

}
}

private void handleAfterCreateTable(PolarisEvent event) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, with flattened events hierarchy there is not a huge difference between these two handleXYZ methods. It would certainly be possible to create a unified handleEvent method that is valid for all events, thus saving us the hassle of writing 150+ methods (just a thought for later, not for this PR though).

Copy link
Contributor

@adnanhemani adnanhemani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A very nice job done on this one, @olsoloviov - thank you so much! I left a handful of nit comments and a few minor ones as well, but I think this is very close to merging!

@SuppressWarnings("unchecked")
public static final AttributeKey<Map<String, String>> NAMESPACE_PROPERTIES =
(AttributeKey<Map<String, String>>)
(AttributeKey<?>) AttributeKey.of("namespace_properties", Map.class);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Admittedly I haven't thought through all the ramifications, but initially I agree with @adutra on this one. Maybe we just keep the bar that whatever the object is, it must be implementing Serializable?

Personally, I'd rather have this enforced at the time of the event generation than having to deal with it later in the Event lifecycle - but not a strong opinion.

Assertions.assertThat(beforeRefreshEvent.tableIdentifier()).isEqualTo(TestData.TABLE);
PolarisEvent beforeRefreshEvent =
testPolarisEventListener.getLatest(PolarisEventType.BEFORE_REFRESH_TABLE);
Assertions.assertThat(beforeRefreshEvent.attribute(EventAttributes.TABLE_IDENTIFIER))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Let's test these using requiredAttribute rather than attribute?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commenting in case you forgot to make this change :)

}

@Test
void testTypeCastValidValue() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I see value in this test case. Maybe do something like making an AttributeKey of a Long value, then give it an int and see if it casted that properly?

IOW, we shouldn't really be testing cast if we didn't write it. We should be testing the code that we wrote.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @adnanhemani, for your comments, I agree that it does not make any sense to test Class.cast()

}

@Test
void testTypeCastInvalidType() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to above, I would move this test into the test for PolarisEvent and test that sending an int to an AttributeKey would throw when calling key.attribute(..., 123).

void testBuilderCreatesEvent() {
PolarisEvent event =
PolarisEvent.builder(PolarisEventType.BEFORE_CREATE_TABLE, TEST_METADATA)
.attribute(EventAttributes.CATALOG_NAME, "my-catalog")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: extract the magic variables for maintainability


assertThatThrownBy(() -> event.requiredAttribute(EventAttributes.CATALOG_NAME))
.isInstanceOf(IllegalStateException.class)
.hasMessageContaining("catalog_name")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can we test for a more descriptive message? Not looking for exact string matching but something similar to "Required attribute" and/or "not found in event".

Boolean.class,
Integer.class,
Long.class,
Double.class,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing Float? Also: we could probably authorize any type that extends Number, including BigInteger and BigDecimal.

Long.class,
Double.class,
// Collections
List.class,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably too loose. We are not checking the collections element types here, so one could include a list or a set of any element type, effectively beating the purpose of this class in the first place. I would suggest to also check the element type and the map's key and value types: these should conform to the list specified here as well.

TableMetadata.class,
ViewMetadata.class,
// Iceberg REST request types
CreateNamespaceRequest.class,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you could check if the type implements RESTMessage?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is a suggestion that leverages TypeToken in order to check element types of List, Set and Map:

  static final Set<Class<?>> ALLOWED_TYPES =
      Set.of(
          String.class,
          Boolean.class,
          Number.class,
          RESTMessage.class,
          Namespace.class,
          TableIdentifier.class,
          TableMetadata.class,
          ViewMetadata.class,
          Catalog.class,
          Principal.class,
          PrincipalRole.class,
          PrincipalWithCredentials.class,
          CatalogRole.class,
          GrantResource.class,
          PolarisPrivilege.class,
          GenericTable.class);

  static boolean isAllowed(TypeToken<?> type) {
    Class<?> rawType = type.getRawType();
    if (rawType.equals(List.class)) {
      TypeToken<?> elementType = type.resolveType(List.class.getTypeParameters()[0]);
      return isSubtypeOfAllowedType(elementType.getRawType());
    } else if (rawType.equals(Set.class)) {
      TypeToken<?> elementType = type.resolveType(Set.class.getTypeParameters()[0]);
      return isSubtypeOfAllowedType(elementType.getRawType());
    } else if (rawType.equals(Map.class)) {
      TypeToken<?> keyType = type.resolveType(Map.class.getTypeParameters()[0]);
      TypeToken<?> valueType = type.resolveType(Map.class.getTypeParameters()[1]);
      return isSubtypeOfAllowedType(keyType.getRawType())
          && isSubtypeOfAllowedType(valueType.getRawType());
    } else {
      return isSubtypeOfAllowedType(rawType);
    }
  }

  private static boolean isSubtypeOfAllowedType(Class<?> rawType) {
    return ALLOWED_TYPES.stream().anyMatch(t -> t.isAssignableFrom(rawType));
  }

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've seen the updated version, that's much better, thanks! But I'm now concerned about the performance aspect. All this computation is done for every event attribute. I am worried that this is a bit too much. @adnanhemani and @olsoloviov, wdyt? Am I overthinking this?

*/
public final class AttributeKey<T> {
private final String name;
private final Class<T> type;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest that we use Guava's com.google.common.reflect.TypeToken here, as it handles generic types as well.

*
* @param <T> the type of the attribute value
*/
public final class AttributeKey<T> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class could be a record. Then the constructor could be public and we wouldn't need the static of method.

The following should be enough (also introducing support for TypeToken):

public record AttributeKey<T>(@JsonValue String name, TypeToken<T> type) {

  public AttributeKey(String name, Class<T> type) {
    this(name, TypeToken.of(type));
  }

  public AttributeKey(String name, TypeToken<T> type) {
    this.name = Objects.requireNonNull(name, "name");
    this.type = Objects.requireNonNull(type, "type");
    if (!AllowedAttributeTypes.isAllowed(type)) {
      throw new IllegalArgumentException("Type " + type + " is not allowed for event attributes");
    }
  }
}

AttributeKey.of("register_table_request", RegisterTableRequest.class);
public static final AttributeKey<RenameTableRequest> RENAME_TABLE_REQUEST =
AttributeKey.of("rename_table_request", RenameTableRequest.class);
public static final AttributeKey<TableMetadata> TABLE_METADATA_BEFORE =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My IDE is flagging some attributes as unused, which is intriguing. This is one of them.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching, these are not needed after ...CommitTable events were removed

* Represents an event emitted by Polaris. Events have a type, metadata, and a map of typed
* attributes. Use {@link #builder(PolarisEventType, PolarisEventMetadata)} to create instances.
*/
public record PolarisEvent(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI we can leverage the Java Immutables library to create the builder class for us. Here is how to do it:

@Value.Style(depluralize = true)
public record PolarisEvent(
    PolarisEventType type, PolarisEventMetadata metadata, Map<AttributeKey<?>, Object> attributes) {

  @Builder.Constructor
  public PolarisEvent { ...  }

}

Then we can create an event like this:

PolarisEvent event =
  new PolarisEventBuilder()
      .type(PolarisEventType.BEFORE_LIMIT_REQUEST_RATE)
      .metadata(eventMetadataFactory.create())
      .putAttribute(EventAttributes.HTTP_METHOD, ctx.getMethod())
      .putAttribute(
          EventAttributes.REQUEST_URI, ctx.getUriInfo().getAbsolutePath().toString())
      .build();

Another option is to transform the record into an interface and annotate with @PolarisImmutable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the notion. I see only one issue with such approach - putAttribute() will not be type-safe, e.g. something like .putAttributes(EventAttributes.REQUEST_URI, 123) will compile

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's right indeed. But now that we have AttributeMap, shouldn't we move those put methods to AttributeMap?

Copy link
Contributor

@adutra adutra Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented this idea, let me know what you all think:

adutra@2286bfd68

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @adutra's idea is likely better. Let's go with it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, @adutra, I see what you mean, looks good. I'd like to use your commit if you are ok with it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course, feel free to copy or cherry-pick.

PolarisEventType type, PolarisEventMetadata metadata, Map<AttributeKey<?>, Object> attributes) {

public PolarisEvent {
attributes = Collections.unmodifiableMap(new HashMap<>(attributes));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
attributes = Collections.unmodifiableMap(new HashMap<>(attributes));
attributes = Map.copyOf(attributes);

* attributes. Use {@link #builder(PolarisEventType, PolarisEventMetadata)} to create instances.
*/
public record PolarisEvent(
PolarisEventType type, PolarisEventMetadata metadata, Map<AttributeKey<?>, Object> attributes) {
Copy link
Contributor

@adutra adutra Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: exposing a Map<AttributeKey<?>, Object> parameter is a bit low-level. It would be better to introduce an AttributeMap class with type-safe operations for get and put. That's what Netty does, cf. io.netty.util.AttributeMap.

Very simple impl suggestion:

public final class AttributeMap {

  private final Map<AttributeKey<?>, Object> attributes = new HashMap<>();

  @SuppressWarnings("unchecked")
  public <T> Optional<T> get(AttributeKey<T> key) {
    return Optional.ofNullable((T) attributes.get(key));
  }

  public <T> T getRequired(AttributeKey<T> key) {
    return get(key)
        .orElseThrow(() -> new IllegalStateException("Required attribute " + key + " not found"));
  }
}

Copy link
Contributor

@adnanhemani adnanhemani Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm 50/50 on this being a requirement, but willing to go with @adutra's suggestion here :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, that makes sense. It is not compatible with using Immutables on PolarisEvent though - putAttribute() will not be generated, so we will have to stick with manual builder.

@olsoloviov
Copy link
Contributor Author

Thanks for your comprehensive comments, @adutra! I have implemented most of your suggestions in the new commit. Immutables-generated builder does not work nice with AttributeMap though, so I stuck with manual builder.

Namespace namespace = event.requiredAttribute(EventAttributes.NAMESPACE);
String tableName = event.requiredAttribute(EventAttributes.TABLE_NAME);

org.apache.polaris.core.entity.PolarisEvent polarisEvent =
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: not sure why we had to write the full class hierarchy here. Was there a change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, yeah. This is another PolarisEvent from polaris-core. It was imported before, but now we had to import our org.apache.polaris.service.events.PolarisEvent, so now we have to fully qualify that one.

}

private static boolean isSubtypeOfAllowedType(Class<?> rawType) {
return ALLOWED_TYPES.stream().anyMatch(t -> t.isAssignableFrom(rawType));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if this is too nit-picky - but maybe something like this to ensure we don't waste a lot of cycles going through this check for the same handful of classes?

private static final ClassValue<Boolean> IS_ALLOWED =
    new ClassValue<>() {
      @Override
      protected Boolean computeValue(Class<?> type) {
        for (Class<?> allowed : ALLOWED_TYPES) {
          if (allowed.isAssignableFrom(type)) {
            return true;
          }
        }
        return false;
      }
    };

private static boolean isSubtypeOfAllowedType(Class<?> rawType) {
  return IS_ALLOWED.get(rawType);
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice idea, thanks!

Assertions.assertThat(beforeRefreshEvent.tableIdentifier()).isEqualTo(TestData.TABLE);
PolarisEvent beforeRefreshEvent =
testPolarisEventListener.getLatest(PolarisEventType.BEFORE_REFRESH_TABLE);
Assertions.assertThat(beforeRefreshEvent.attribute(EventAttributes.TABLE_IDENTIFIER))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commenting in case you forgot to make this change :)

* Represents an event emitted by Polaris. Events have a type, metadata, and a map of typed
* attributes. Use {@link #builder(PolarisEventType, PolarisEventMetadata)} to create instances.
*/
public record PolarisEvent(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @adutra's idea is likely better. Let's go with it.

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

/** Request filter that returns a 429 Too Many Requests if the rate limiter says so */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why remove the javadocs?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wow, this is weird, thanks for catching!

}

@Test
void testEqualsAndHashCode() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the tests here are not useful anymore since the class is now a record.

ImmutablePolarisEventMetadata.builder().realmId(TEST_REALM).build();

@Test
void testBuilderCreatesEvent() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, this test is not very useful anymore since the class is a record.

Others below aren't useful either because they are testing AttributeMap: they should be moved to AttributeMapTest.

Copy link
Contributor

@adnanhemani adnanhemani left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ready to go on these changes. Fantastic job with this, @olsoloviov! Thank you for bearing through this massive PR with us :)

@github-project-automation github-project-automation bot moved this from PRs In Progress to Ready to merge in Basic Kanban Board Jan 12, 2026
@adutra
Copy link
Contributor

adutra commented Jan 12, 2026

Thanks a lot for this awesome work @olsoloviov !

A minor improvement that comes to my mind right now: when there is no listener (e.g. if polaris.event-listener.type=no-op), or when the listener is not "interested" in all events (e.g. the CloudWatch listener), instantiating a PolarisEvent + PolarisEventMetadata + AttributeMap is useless. We should come up with a way to conditionally create events only when necessary. E.g.:

public Response createCatalog(
    CreateCatalogRequest request, RealmContext realmContext, SecurityContext securityContext) {
  if (polarisEventListener.isEventEnabled(PolarisEventType.BEFORE_CREATE_CATALOG)) {
    polarisEventListener.onEvent(
        new PolarisEvent(
            PolarisEventType.BEFORE_CREATE_CATALOG,
            eventMetadataFactory.create(),
            new AttributeMap()
                .put(EventAttributes.CATALOG_NAME, request.getCatalog().getName())));
  }
  Response resp = delegate.createCatalog(request, realmContext, securityContext);
  if (polarisEventListener.isEventEnabled(PolarisEventType.AFTER_CREATE_CATALOG)) {
    polarisEventListener.onEvent(
        new PolarisEvent(
            PolarisEventType.AFTER_CREATE_CATALOG,
            eventMetadataFactory.create(),
            new AttributeMap().put(EventAttributes.CATALOG, (Catalog) resp.getEntity())));
  }
  return resp;
}

Wdyt?

@olsoloviov
Copy link
Contributor Author

olsoloviov commented Jan 12, 2026

A minor improvement that comes to my mind right now: when there is no listener (e.g. if polaris.event-listener.type=no-op), or when the listener is not "interested" in all events (e.g. the CloudWatch listener), instantiating a PolarisEvent + PolarisEventMetadata + AttributeMap is useless. We should come up with a way to conditionally create events only when necessary.

Thanks for the suggestion, @adutra, I think it's quite reasonable. Unless you prefer it to be fixed in this PR, I think I can do it in a follow up PR, since it is somewhat orthogonal to the current improvement.

@adutra
Copy link
Contributor

adutra commented Jan 12, 2026

Thanks for the suggestion, @adutra, I think it's quite reasonable. Unless you prefer it to be fixed in this PR, I think I can do it in a follow up PR, since it is somewhat orthogonal to the current improvement.

I meant this for a follow-up PR 👍

@adutra
Copy link
Contributor

adutra commented Jan 12, 2026

I'm leaving a few more hours for review. If no objections come up, I'll merge tomorrow morning my time (CET).

@adutra adutra merged commit ce627f9 into apache:main Jan 13, 2026
15 checks passed
@github-project-automation github-project-automation bot moved this from Ready to merge to Done in Basic Kanban Board Jan 13, 2026
evindj pushed a commit to evindj/polaris that referenced this pull request Jan 26, 2026
snazy added a commit to snazy/polaris that referenced this pull request Feb 11, 2026
* Flatten events hierarchy (apache#3293)

Co-authored-by: Alexandre Dutra <[email protected]>

* (feat)Python CLI: Switch from Poetry to UV for python package management (apache#3410)

* chore(deps): update dependency uv to v0.9.24 (apache#3430)

* (doc): Fix Polaris getting started doc and docker-compose (apache#3425)

* Fix Polaris getting started doc

* Fix Polaris getting started doc

* [Minor] [Site] fix scheduled meetings table (apache#3423)

* NoSQL: add to config-docs (apache#3397)

Add the NoSQL specific configurtion options to the configuration docs generation module.

* Blog: Add blog for Lance-Polaris integration (apache#3424)

* Add `--hierarchical` to Polaris CLI (apache#3426)

* Add `--hierarchical` to Polaris CLI

Following up on apache#3347 this change adds the `--hierarchical`
option to Polaris CLI in order to allow configuring this
storage flag in Azure-based Catalogs.

* Use new Request Context for each realm during implicit bootstrap (apache#3411)

* Use new Request Context for each realm during implicit bootstrap

The implicit (auto) bootstrap calls used to share Request Context
for potentially many realms. That used to work by coincidence because
`RealmConfig`, for example, is a `RequestScoped` bean.

With this change each realm will be bootstrapped in its own dedicated
Request Context.

This change lays down a foundation for future refactoring related to `RealmConfig`.

* Change nested docs to use title case (apache#3432)

* fix(deps): update dependency com.github.dasniko:testcontainers-keycloak to v4.1.1 (apache#3438)

* Fix Helm doc note section under Gateway API (apache#3436)

* Relax UV version (apache#3437)

* fix(deps): update dependency org.jboss.weld.se:weld-se-core to v6.0.4.final (apache#3439)

* Add free-disk-space action to regtest + spark_client_regtests (apache#3429)

The "Spark Client Regression Tests" CI job requires some disk space to operate. With just a little bit of added "content", the job will fail to `no space left on device` during the `docker compose` invocation building an image. Such errors make it impossible to get the log from the workflow, unless you capture the log before the workflow runs into the `no space left on device` situation. With "no space left", GitHub workflow infra is unable to capture the logs.

```
 #10 ERROR: failed to copy files: userspace copy failed: write /home/spark/polaris/v3.5/integration/build/2.13/quarkus-build/gen/quarkus-app/lib/main/com.google.http-client.google-http-client-1.47.1.jar: no space left on device
```

This change is a stop-gap solution to prevent this error from happening for now.

* fix(deps): update dependency com.google.cloud:google-cloud-iamcredentials to v2.82.0 (apache#3449)

* Update OPA docker image version (apache#3448)

* Blog: Mapping Legacy and Heterogeneous Datalakes in Apache P… (apache#3417)

* fix(deps): update dependency org.postgresql:postgresql to v42.7.9 (apache#3453)

* chore(deps): update apache/spark docker tag to v3.5.8 (apache#3458)

* fix(deps): update dependency org.apache.spark:spark-sql_2.12 to v3.5.8 (apache#3450)

* site: add blog anchors (apache#3443)

* render anchor

* improve readme

* RAT

* fix(deps): update dependency com.google.cloud:google-cloud-storage-bom to v2.62.0 (apache#3455)

* Update renovate to include docker file with suffix (apache#3454)

* feat: Add trace_id to AWS STS session tags for end-to-end correlation (apache#3414)

* feat: Add trace_id to AWS STS session tags for end-to-end correlation

This change enables deterministic correlation between:
- Catalog operations (Polaris events)
- Credential vending (AWS CloudTrail via STS session tags)
- Metrics reports from compute engines (Spark, Trino, etc.)

Changes:
1. Add traceId field to CredentialVendingContext
   - Marked with @Value.Auxiliary to exclude from cache key comparison
   - Every request has unique trace ID, so including it in equals/hashCode
     would prevent all cache hits
   - Trace ID is for correlation/audit only, not authorization

2. Extract OpenTelemetry trace ID in StorageAccessConfigProvider
   - getCurrentTraceId() extracts trace ID from current span context
   - Populates CredentialVendingContext.traceId for each request

3. Add trace_id to AWS STS session tags
   - AwsSessionTagsBuilder includes trace_id in session tags
   - Appears in CloudTrail logs for correlation with catalog operations
   - Uses 'unknown' placeholder when trace ID is not available

4. Update tests to verify trace_id is included in session tags

This enables operators to correlate:
- Which catalog operation triggered credential vending
- Which data access events in CloudTrail correspond to catalog operations
- Which metrics reports correspond to specific catalog operations

* Update AwsCredentialsStorageIntegrationTest.java

* Review comments

  1. Feature Flag to Disable Trace IDs in Session Tags

   Added a new feature configuration flag INCLUDE_TRACE_ID_IN_SESSION_TAGS in FeatureConfiguration.java:
   polaris-core/src/main/java/org/apache/polaris/core/config/FeatureConfiguration.java (EXCERPT)
   public static final FeatureConfiguration<Boolean> INCLUDE_TRACE_ID_IN_SESSION_TAGS =
       PolarisConfiguration.<Boolean>builder()
           .key("INCLUDE_TRACE_ID_IN_SESSION_TAGS")
           .description("If set to true (and INCLUDE_SESSION_TAGS_IN_SUBSCOPED_CREDENTIAL is also true), ...")
           .defaultValue(false)
           .buildFeatureConfiguration();

   2. Cache Key Correctness Solution

   The solution ensures cache correctness by including trace IDs in cache keys only when they affect the vended credentials:

   Key changes:

     1. `StorageCredentialCacheKey` - Added a new traceIdForCaching() field that is populated only when trace IDs affect credentials:
   polaris-core/src/main/java/org/apache/polaris/core/storage/cache/StorageCredentialCacheKey.java (EXCERPT)
   @Value.Parameter(order = 10)
   Optional<String> traceIdForCaching();

     2. `StorageCredentialCache` - Reads both flags and includes trace ID in cache key only when both are enabled:
   polaris-core/src/main/java/org/apache/polaris/core/storage/cache/StorageCredentialCache.java (EXCERPT)
   boolean includeTraceIdInCacheKey = includeSessionTags && includeTraceIdInSessionTags;
   StorageCredentialCacheKey key = StorageCredentialCacheKey.of(..., includeTraceIdInCacheKey);

     3. `AwsSessionTagsBuilder` - Conditionally includes trace ID based on the new flag.

     4. Tests - Updated existing tests and added a new test testSessionTagsWithTraceIdWhenBothFlagsEnabled.

   How This Resolves the Cache Correctness vs. Efficiency Trade-off

   | Configuration | Trace ID in Session Tags | Trace ID in Cache Key | Caching Behavior |
   |---------------|--------------------------|----------------------|------------------|
   | Session tags disabled | No | No | Efficient caching |
   | Session tags enabled, trace ID disabled (default) | No | No | Efficient caching |
   | Session tags enabled, trace ID enabled | Yes | Yes | Correct but no caching across requests |

   This design ensures:
     • Correctness: When trace IDs affect credentials, they're included in the cache key
     • Efficiency: When trace IDs don't affect credentials, they're excluded from the cache key, allowing cache hits across requests

* Update CHANGELOG.md

Co-authored-by: Anand Kumar Sankaran <[email protected]>

* site: Update website for 1.3.0 (apache#3464)

* site: Fix blog diagram with corrected architecture image (apache#3466)

* Site: Add 20260108 Community Meeting (apache#3460)

* CI: CLI Nightly build (apache#3457)

* Fix Helm repository update after release vote (apache#3461)

The Github workflow included a `svn add index.yaml` command which would
be correct if it was a Git repository.  But in SVN, this results in an
error when the file is already under version control.  This line is
unnecessary and a simple `svn commit` results in pushing the changes to
the SVN server.

* Fix typo for the wrong reference (apache#3473)

* chore(deps): update apache/ozone docker tag to v2.1.0 (apache#3364)

* chore(deps): update docker.io/prom/prometheus docker tag to v3.9.1 (apache#3366)

* chore(deps): update quay.io/keycloak/keycloak docker tag to v26.5.1 (apache#3362)

* Last merged commit 1451ce4

---------

Co-authored-by: Oleg Soloviov <[email protected]>
Co-authored-by: Alexandre Dutra <[email protected]>
Co-authored-by: Yong Zheng <[email protected]>
Co-authored-by: Mend Renovate <[email protected]>
Co-authored-by: Danica Fine <[email protected]>
Co-authored-by: Jack Ye <[email protected]>
Co-authored-by: Dmitri Bourlatchkov <[email protected]>
Co-authored-by: Maninder <[email protected]>
Co-authored-by: Kevin Liu <[email protected]>
Co-authored-by: Anand K Sankaran <[email protected]>
Co-authored-by: Anand Kumar Sankaran <[email protected]>
Co-authored-by: Pierre Laporte <[email protected]>
Co-authored-by: JB Onofré <[email protected]>
Co-authored-by: Honah (Jonas) J. <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants