Added interface for reporting metrics#2887
Conversation
adutra
left a comment
There was a problem hiding this comment.
Thanks a lot for this contribution @cccs-cat001! I left a few comments mostly around API design.
runtime/service/src/main/java/org/apache/polaris/service/reporting/LoggingMetricsReporter.java
Outdated
Show resolved
Hide resolved
| @Override | ||
| public void reportMetric( | ||
| String prefix, String namespace, String table, ReportMetricsRequest reportMetricsRequest) { | ||
| // Do Nothing |
There was a problem hiding this comment.
- Would it make sense to define and maintain a set of Micrometer metrics and increment them here?
- Would it make sense to fire a
PolarisEventhere?
(I don't really know what an IRC server is supposed to do with these scan or commit metrics 😅)
There was a problem hiding this comment.
I'm not sure what those are 😅 my intentions were to have the default behaviour remain the same, and adding that may change the default behaviour
There was a problem hiding this comment.
😄 - no worries, we can look into that later.
There was a problem hiding this comment.
My recommendation was to store them in persistence, so that later we can serve them later, I recommend the same to @pingtimeout https://lists.apache.org/thread/koogjvb73hbl9ospmm7owsdx8oh67wk1
There was a problem hiding this comment.
I guess we would need to purge that table frequently. Not sure if it's a good default to store in persistence without providing the purge functionality. Could that be an alternate CDI bean?
There was a problem hiding this comment.
Agree, though It can be any persistence, not necessarily an OLTP imho an OLAP is a valid consideration too, so is dumping in object storge so analytical engines can make a table on top of from Polaris POV, it seems to me that we are settling on catalog name and tble identifiers in addition to report from interface POV, why is realm-id not part of it is my doubt, why is authenticate prinicipals not part of these metric ?
runtime/service/src/main/java/org/apache/polaris/service/reporting/MetricsReporter.java
Outdated
Show resolved
Hide resolved
runtime/service/src/main/java/org/apache/polaris/service/reporting/MetricsReporter.java
Outdated
Show resolved
Hide resolved
...ce/src/test/java/org/apache/polaris/service/reporting/MetricsReportingConfigurationTest.java
Outdated
Show resolved
Hide resolved
runtime/service/src/main/java/org/apache/polaris/service/reporting/LoggingMetricsReporter.java
Outdated
Show resolved
Hide resolved
dimas-b
left a comment
There was a problem hiding this comment.
Thanks for your interest in Polaris and contributing, @cccs-cat001 !
I'm not sure how this PR relates to the /metrics endpoint in the Iceberg REST API, though 🤔 Could you clarify, please?
| @RequestScoped | ||
| public MetricsReporter metricsReporter( | ||
| MetricsReportingConfiguration config, @Any Instance<MetricsReporter> reporters) { | ||
| return reporters.select(Identifier.Literal.of(config.type())).get(); |
There was a problem hiding this comment.
config.type() is fixed in runtime... Can this bean be @ApplicationScoped?
There was a problem hiding this comment.
This producer is annotated @ApplicationScoped but it returns a PolarisMetricsReporter, which is @RequestScoped. That’s problematic in CDI because an @ApplicationScoped bean lives for the entire application lifecycle, while a @RequestScoped bean only lives for a single request.
If the reporter (or a future implementation) ever injects request-scoped beans like RealmContext or SecurityContext, this setup would either fail at deployment or produce invalid references once the request ends.
Instead of producing a reporter directly in an app-scoped producer, introduce a small factory:
@ApplicationScoped
public class PolarisMetricsReporterFactory {
public PolarisMetricsReporter getOrCreate(String type) {
}
}Similar to MetaStoreManagerFactory, which ensure the scope correct, while avoid creating a new object for every request. It is also more flexible when we introduce more reporter types.
There was a problem hiding this comment.
A factory is ApplicationScoped(or Singleton), but a bean can be ApplicationScoped or RequestScoped:
- Should the bean exist only during the lifespan of a request? If so, use @RequestScoped.
- Should the bean exist during the lifespan of the application and should the state of the bean be shared between all the requests? If so, use @ApplicationScoped.
So, to use ApplicationScoped here, a factory is probably clearer.
There was a problem hiding this comment.
I'm -1 on the factory approach, The factory approach in Polaris is generally used for creating and caching realm-scoped beans. The producer method approach OTOH is used for selecting beans at runtime according to their @Identifier.
Here, it's clearly an application-scoped bean, selectable at runtime with its @Identifier.
The correct fix for me is just to replace @RequestScoped with @ApplicationScoped in the producer method:
@ApplicationScoped
public MetricsReporter metricsReporter(...)There was a problem hiding this comment.
The correct fix for me is just to replace
@RequestScopedwith@ApplicationScopedin the producer method:
+1
There was a problem hiding this comment.
@cccs-cat001 please change the scope of the DefaultMetricsReporter bean to also @ApplicationScoped – there is no reason for it to be request-scoped.
Once this change is done we should be good 🤞
There was a problem hiding this comment.
Actually, given this producer method, I do not think DefaultMetricsReporter needs a scope annotation at all. It is always created via this method.
There was a problem hiding this comment.
Okay I see, I think you're right and we don't need the annotation. Other examples with @ApplicationScope in the ServiceProducer follow the same pattern.
There was a problem hiding this comment.
Actually, given this producer method, I do not think DefaultMetricsReporter needs a scope annotation at all. It is always created via this method.
Yes and no :-) It doesn't need the annotation but 1) all other beans in the same situation are annotated and 2) annotating the bean allows for direct injection, e.g. in a test:
@Inject @Identifier("default") MetricsReporter defaultMetricsReporter;So, I'm rather in favor of anotating both places, even if I understand that this can cause some confusion.
There was a problem hiding this comment.
Annotating both places SGTM 👍 Using @ApplicationScoped, right?
If that can help: I was also puzzled initially, but we are actually talking about this IRC endpoint: polaris/spec/iceberg-rest-catalog-open-api.yaml Line 1288 in f7b5646 |
runtime/service/src/main/java/org/apache/polaris/service/reporting/LoggingMetricsReporter.java
Outdated
Show resolved
Hide resolved
...ce/src/test/java/org/apache/polaris/service/reporting/MetricsReportingConfigurationTest.java
Outdated
Show resolved
Hide resolved
...ce/src/test/java/org/apache/polaris/service/reporting/MetricsReportingConfigurationTest.java
Outdated
Show resolved
Hide resolved
adutra
left a comment
There was a problem hiding this comment.
Almost there! Just a few minor issues.
runtime/service/src/main/java/org/apache/polaris/service/reporting/DefaultMetricsReporter.java
Outdated
Show resolved
Hide resolved
...e/service/src/test/java/org/apache/polaris/service/reporting/DefaultMetricsReporterTest.java
Outdated
Show resolved
Hide resolved
...e/service/src/test/java/org/apache/polaris/service/reporting/DefaultMetricsReporterTest.java
Outdated
Show resolved
Hide resolved
runtime/service/src/main/java/org/apache/polaris/service/reporting/DefaultMetricsReporter.java
Outdated
Show resolved
Hide resolved
...ests/src/main/java/org/apache/polaris/service/it/test/PolarisRestCatalogIntegrationBase.java
Outdated
Show resolved
Hide resolved
Co-authored-by: Alexandre Dutra <[email protected]>
…ting/DefaultMetricsReporter.java Co-authored-by: Alexandre Dutra <[email protected]>
…ting/DefaultMetricsReporter.java Co-authored-by: Alexandre Dutra <[email protected]>
…test/PolarisRestCatalogIntegrationBase.java Co-authored-by: Alexandre Dutra <[email protected]>
|
I'm staring at the logs of the two failed checks and I'm not really sure what the cause of the errors are. Do you have any advice? 😅 |
You have 4 files without license header: The other error occurred while pulling a Docker image. It's a transient error. |
|
@cccs-cat001 One last thing and we are good to go: do you mind adding this blurb to the CHANGELOG.md file? ### Highlights
- Support for [Iceberg Metrics Reporting] has been introduced in Polaris. Out of the box, metrics can
be printed to the logs by setting the `org.apache.polaris.service.reporting` logger level to `INFO` (it's
set to `OFF` by default). Custom reporters can be implemented and configured to send metrics to
external systems for further analysis and monitoring.
[Iceberg Metrics Reporting]: https://iceberg.apache.org/docs/latest/metrics-reporting/ |
.../service/src/main/java/org/apache/polaris/service/catalog/iceberg/IcebergCatalogAdapter.java
Outdated
Show resolved
Hide resolved
dimas-b
left a comment
There was a problem hiding this comment.
Thanks for bearing with me @cccs-cat001 🙂
|
As @adutra noted, let's wait at least until tomorrow for other people to have a chance at reviewing. Side note: we may want to add Authorization checks for metric reporting (in a follow-up PR, of course). |
runtime/service/src/main/java/org/apache/polaris/service/reporting/PolarisMetricsReporter.java
Show resolved
Hide resolved
runtime/service/src/main/java/org/apache/polaris/service/reporting/PolarisMetricsReporter.java
Show resolved
Hide resolved
runtime/service/src/main/java/org/apache/polaris/service/reporting/DefaultMetricsReporter.java
Show resolved
Hide resolved
runtime/service/src/main/java/org/apache/polaris/service/reporting/DefaultMetricsReporter.java
Show resolved
Hide resolved
runtime/service/src/main/java/org/apache/polaris/service/reporting/DefaultMetricsReporter.java
Show resolved
Hide resolved
runtime/service/src/main/java/org/apache/polaris/service/reporting/DefaultMetricsReporter.java
Show resolved
Hide resolved
|
Are we ok to merge? I see a lot of recent comments resolved, but no fresh approvals 😅 |
| import org.slf4j.Logger; | ||
| import org.slf4j.LoggerFactory; | ||
|
|
||
| @RequestScoped |
There was a problem hiding this comment.
Can we remove this scope annotation if this is meant to be application scoped? More details are here, https://github.com/apache/polaris/pull/2887/files#r2487347077
|
The current CI failure actually demonstrates that we must annotate the CDI bean with something – otherwise, it's not a CDI bean and so it's not included in the alternatives passed to the producer method. TLDR: |
|
TIL: re: bean annotations 😅 Like I commented on the specific thread (above) I'm ok with annotating both the bean and the producer with |
|
Okay final answer (hopefully!!!) |
flyrain
left a comment
There was a problem hiding this comment.
LGTM. Thanks a lot for adding this, @cccs-cat001 !
| import org.apache.iceberg.catalog.TableIdentifier; | ||
| import org.apache.iceberg.metrics.MetricsReport; | ||
|
|
||
| public interface PolarisMetricsReporter { |
There was a problem hiding this comment.
Not a blocker: would the name IcebergMetricsReporter be more descriptive as this is only for iceberg tables, other tables managed by Polaris won't be able to report the metrics?
There was a problem hiding this comment.
Good point. But let's merge this PR (the author has been kind enough to address our collective feedback for two weeks now) and address this one in a follow-up task.
There was a problem hiding this comment.
there was another TODO mentioned here #2887 (comment)
If my time allows I'd be happy to help contribute that, and if we can get the requirements defined beforehand :)
|
Thanks you so much @cccs-cat001 for your contribution – and your patience 😅 |
Co-authored-by: Alexandre Dutra <[email protected]>
* fix typo in Ozone getting-started guide (apache#2975) * NoSQL: Persistence API (apache#2965) Provides the persistence API parts for the NoSQL persistence work. Most of this PR are purely interfaces and annotations. It consists of a low-level `Persistence` interface to read and write individual `Obj`ects and the corresponding pluggable `ObjType`s. The API module also contains upstream SPIs for database specific implementations and downstream APIs for atomic commits and indexes. Also some CDI specific infrastructure helper annotations. Unit tests cover the few pieces of actual executable code in this change. This change also adds a README, which references functionality and modules that are not in this PR, but already provide an overview of the overall interactions. * Handle poetry lock update (apache#2942) * Update dependency com.azure:azure-sdk-bom to v1.3.2 (apache#2979) * Added interface for reporting metrics (apache#2887) Co-authored-by: Alexandre Dutra <[email protected]> * Prefer RealmConfig fields (apache#2971) minor cleanup of verbose code * Update community meetings note links due to document split (apache#2981) * NoSQL: database agnostic implementation + in-memory backend (apache#2977) This change contains the database agnostic implementation plus the in-memory backend used for testing purposes, and a Junit extension. These three modules are difficult to put into isolated PRs. The "main implementation" contains the commit-logic, indexes-logic and the caching part. `PersistenceImplementation` is (more or less) a wrapper providing higher-level functionality backed by a database's `Backend` implementation. The latter provides the bare minimum functionality. Other implementations of the `Persistence` interface are just to transparently add caching and commit-attempt specific case. No call site needs to bother about the actual implementation and/or its layers. Tests in the `polaris-persistence-nosql-impl` module use the in-memory backend via the Junit extension. Common tests for all backends, in-memory in this PR and MongoDB in a follow-up, are in the testFixtures of the `polaris-persistence-nosql-impl`. * Fix incorrect column name in ModelEvent (apache#2987) Fixes apache#2913 * Rename request ID header (apache#2988) * Update dependency org.agrona:agrona to v2.3.1 (apache#2986) * Update actions/stale digest to fad0de8 (apache#2984) * Last merged commit 291bd7d --------- Co-authored-by: Dmitri Bourlatchkov <[email protected]> Co-authored-by: Yong Zheng <[email protected]> Co-authored-by: Mend Renovate <[email protected]> Co-authored-by: cccs-cat001 <[email protected]> Co-authored-by: Alexandre Dutra <[email protected]> Co-authored-by: Christopher Lambert <[email protected]> Co-authored-by: JB Onofré <[email protected]>
What changes were proposed in this pull request?
I'm adding an interface to be able to code for the
/metricsendpoint in the Iceberg REST API Spec.Why are the changes needed?
The metrics endpoint in Polaris is currently a no-op endpoint. In our organization we use it for auditing who is reading/writing tables, as this endpoint is called after every read/write with Iceberg. I added a default no-op implementation so that the behaviour remains the same, and I added a logging implementation to show how it could be used.
Does this PR introduce any user-facing change?
it adds an optional configuration for telling the metrics endpoint how it should behave. the default behaviour is no change, however if the user sets the
polaris.metrics.reporting.typetologgingit will log the MetricsReport to INFO. There is also the ability to implement the interface for your own needs if necessary.How was this patch tested?
Tests were added. it was also tested in a staging environment.
CHANGELOG.md