NoSQL: Add maintenance service to Polaris admin tool#3395
Conversation
dimas-b
left a comment
There was a problem hiding this comment.
LGTM 👍 just a few minor comments 🙂
| NoSqlMaintenanceRunCommand.class, | ||
| }, | ||
| mixinStandardHelpOptions = true, | ||
| description = "Polaris persistence maintenance.") |
There was a problem hiding this comment.
Should we flag it as "beta" as discussed in the last community sync call?.. or perhaps "preview" since the actual persistent storage that can be maintained via this command has not been hooked up yet (IIRC).
There was a problem hiding this comment.
I think we also probably shouldn't call it persistence maintenance unless we plan on having maintenance for other backends?
| @CommandLine.Command( | ||
| name = "run", | ||
| mixinStandardHelpOptions = true, | ||
| description = {"Run Polaris persistence maintenance."}) |
There was a problem hiding this comment.
maybe Run Polaris NoSQL persistence maintenance for the sake of clarity?
There was a problem hiding this comment.
... and similar description changes in the other new commands?
| # - MongoDb - configure the via the Quarkus extension | ||
| polaris.persistence.nosql.backend=InMemory | ||
|
|
||
| ## MongoDB version store specific configuration |
There was a problem hiding this comment.
MongoDB version store specific configuration defaults? I suppose the user can change any of these settings.
There was a problem hiding this comment.
Same comments request here as on the service pr, These properties should probably be commented out by default since they aren't used
| quarkus.mongodb.database=polaris | ||
| quarkus.mongodb.metrics.enabled=true | ||
| #quarkus.mongodb.connection-string=mongodb://localhost:27017 | ||
| quarkus.mongodb.devservices.enabled=false |
There was a problem hiding this comment.
Is it a runtime setting or build-time (the latter have a dedicated section)?
There was a problem hiding this comment.
Per https://quarkus.io/guides/mongodb-dev-services quarkus.mongodb.devservices.enabled is a build-time config. Please move to the Build-time configuration section (lines 20-38) for the sake of consistency.
|
|
||
| quarkus.arc.ignored-split-packages=\ | ||
| org.apache.polaris.admintool.config,\ | ||
| org.apache.polaris.admintool.maintenance,\ |
There was a problem hiding this comment.
Could we refactor this code to avoid split packages upfront?
There was a problem hiding this comment.
Note: this package no longer exists.
There was a problem hiding this comment.
Package org.apache.polaris.admintool.nosql.maintenance only exists in one module, that is in runtime/admin... Is this config required anymore? 🤔
There was a problem hiding this comment.
Actually, I see different split package warnings in my build:
2026-02-03T00:51:08.714249614Z build-15 WARN Detected a split package usage which is considered a bad practice and should be avoided. Following packages were detected in multiple archives:
- "org.apache.polaris.service.catalog.api" found in [org.apache.polaris:polaris-api-catalog-service:1.4.0-incubating-SNAPSHOT, org.apache.polaris:polaris-api-iceberg-service:1.4.0-incubating-SNAPSHOT]
- "org.apache.polaris.service.catalog.api.impl" found in [org.apache.polaris:polaris-api-catalog-service:1.4.0-incubating-SNAPSHOT, org.apache.polaris:polaris-api-iceberg-service:1.4.0-incubating-SNAPSHOT]
- "org.apache.polaris.service.types" found in [org.apache.polaris:polaris-api-catalog-service:1.4.0-incubating-SNAPSHOT, org.apache.polaris:polaris-api-iceberg-service:1.4.0-incubating-SNAPSHOT]
There was a problem hiding this comment.
I've added a commit to remove this prop.
There was a problem hiding this comment.
I'll deal with the other warnings on main later.
flyrain
left a comment
There was a problem hiding this comment.
I do not think the NoSQL maintenance tool belongs in the runtime admin module. NoSQL is just one persistence impl., and this maintenance logic is only applicable to that specific backend. Other persistence implementations do not need it, yet this change pulls NoSQL specific behavior into a shared admin layer. This creates unnecessary coupling, complicates the module boundaries, and makes the admin tooling harder to reason about and extend going forward.
cc @dennishuo @collado-mike @singhpk234 @jbonofre @HonahX @RussellSpitzer
|
The admin-tool is meant for all persistence implementations, because it is a tool to perform administrative tasks. We have discussed before that other persistence specific operations like schema setup for JDBC would go into the admin-tool. |
|
@flyrain :
NoSQL persistence (by design) needs periodic maintenance by admin users. Not providing a tool for this will mean that no Apache Polaris user will be able to use NoSQL persistence efficiently. As to the location of the tool, Admin CLI is a natural place as it hosts code for I believe this is the minimum the project community can do for OSS users. |
singhpk234
left a comment
There was a problem hiding this comment.
NoSQL maintenance service introduces constructs which are not agreed upon in Apache Polaris, neither have ever been discussed and agreed upon the community. for example retention expression, dependency on an unmaintained project nessie library here mentioned here #3268 (review), other than question to where this should be in Polaris tools or Polaris repo has not been agreed upon, despite being called out.
It has been requested to start dev thread discussion to establish concencus for the same, despite that underlying PR was merged
#3268 (comment)
To avoid any confusion and explicitly mark my objection i am requesting changes, kindly please start thread establish concencus upon the semantics that Polaris should support .
| out.println("use the 'help maintenance' command."); | ||
| out.println(); | ||
|
|
||
| checkInMemory(); |
There was a problem hiding this comment.
Shouldn't we be failing for everything that isn't NoSql not just InMemory?
| checkInMemory(); | ||
|
|
||
| out.println(); | ||
| out.println("Information: selected NoSql persistence backend: " + backend.type()); |
There was a problem hiding this comment.
In the current formulation this isn't correct since you can call this with any backend right?
| @Inject protected MaintenanceConfig maintenanceConfig; | ||
|
|
||
| protected void checkInMemory() { | ||
| if ("InMemory".equals(backend.type())) { |
There was a problem hiding this comment.
Do we need a check here for the persistence not being NoSQL?
|
As discussed on the mailing list, I think this is fine to add into the Admin tool but the current PR needs a bit of work to make it clear to end users that it is just for NoSQL. |
| import picocli.CommandLine; | ||
|
|
||
| @CommandLine.Command( | ||
| name = "nosql", |
RussellSpitzer
left a comment
There was a problem hiding this comment.
LGTM +1, I notice there are still some comments from @dimas-b though that are still not addressed?
I can agree to disagree here, I don't need to agree here either (dismissing my request changes), as we definitely want to, imho this tool doesn't belong in admin module, specially strictly speaking from the MetaStorageManager interface POV bootstrap and purge are well defined in this manage interface. and admin tool documentation its for metastore management.
| NoSqlMaintenanceRunCommand.class, | ||
| }, | ||
| mixinStandardHelpOptions = true, | ||
| description = "Polaris NoSQL persistence.") |
There was a problem hiding this comment.
nit: suggestion: Sub-commands specific to NoSQL persistence.
There was a problem hiding this comment.
I took the liberty to push this update from my side. @snazy : I hope you do not mind.
| import picocli.CommandLine; | ||
|
|
||
| @CommandLine.Command( | ||
| name = "maintenance-info", |
There was a problem hiding this comment.
nit: Maybe add maintenance as another sub-command group?
Current: java -jar ... nosql maintenance-info
Proposed: java -jar ... nosql maintenance info
I hope the latter in nicer in terms of UX.
| quarkus.mongodb.database=polaris | ||
| quarkus.mongodb.metrics.enabled=true | ||
| #quarkus.mongodb.connection-string=mongodb://localhost:27017 | ||
| quarkus.mongodb.devservices.enabled=false |
There was a problem hiding this comment.
Per https://quarkus.io/guides/mongodb-dev-services quarkus.mongodb.devservices.enabled is a build-time config. Please move to the Build-time configuration section (lines 20-38) for the sake of consistency.
| quarkus.index-dependency.agrona.group-id=org.agrona | ||
| quarkus.index-dependency.agrona.artifact-id=agrona |
There was a problem hiding this comment.
I tried building locally without these settings (with the workaround to get build warnings), but agrona was not flagged as needing an index... Is this setting required for the admin tool?
There was a problem hiding this comment.
I re-ran my local build after the recent rebase without this property and I did not see any build warnings about Agrona being unindexed.
@snazy : WDYT about removing this property?
|
Summarizing the dev discussion.
|
|
@flyrain : Would you please reconsider your -1 on this PR given the above discussion? |
|
@RussellSpitzer : just FYI: I re-requested your review my accident 😅 but fresh comments are welcome, of course 🙂 |
|
@snazy : could you rebase this PR? I suspect some of the config concerns are related to the other NoSQL modules merged to |
292cec4 to
3472306
Compare
3472306 to
a1a65a6
Compare
a87bfdc to
366aaae
Compare
There was a problem hiding this comment.
I believe this PR is good to merge based on the discussion in this PR and related dev email threads.
If anyone still sees disadvantages in merging it now, please send a quick recap of what you think those disadvantages might be (it is hard to track all discussion threads and I might have missed something).
dennishuo
left a comment
There was a problem hiding this comment.
(Belated) Thanks @dimas-b for the summary of the dev discussion in #3395 (comment) !
Based on that summary, I'm happy with the core content of the code in this PR to merge-and-iterate. To facilitate that, just a few things would be nice before merging to keep the most useful information intact for posterity:
- Can we update the PR description with the main takeaways from the dev discussion and also link to the dev discussion for posterity so that future folks who track down this PR when figuring things out can easily see the main reference materials without sifting through the really long PR comment threads?
- Can we add user-facing documentation along the same vein as https://github.com/apache/polaris/pull/3818/changes to actually see at a glance what the admin tool invocation will look like with the full nested subcommand and what it looks like to pass in arguments regarding retention time, etc? Normally I'd also expect the sample commands in the PR description of the PR that adds the new commands.
- If the example commands include usage of CEL, to make a note in that documentation that the "overly" expressive CEL syntax might be re-evaluated in the near future so the user shouldn't depend on it if possible (or else they should also join the dev discussion about the CEL functionality).
- Add a link to the aforementioned GH issue referred to in #3395 (comment) into the PR description or doc comments that explain the uncertain fate of CEL for easy navigation
Some of this request for docs follows from my comment in #3268 (review) - I understand the response there was to add docs "later", but since this PR is the one that actually establishes the user interaction model in terms of admin-tool commands, it's nice to pair the addition of the command with instructions for how it works.
| NoSqlMaintenanceRunCommand.class, | ||
| }, | ||
| mixinStandardHelpOptions = true, | ||
| description = "Polaris NoSQL persistence.") |
| }, | ||
| mixinStandardHelpOptions = true, | ||
| description = "Polaris NoSQL persistence.") | ||
| public class NoSqlCommand extends BaseNoSqlCommand { |
There was a problem hiding this comment.
Just a question for my own understanding, since I'm not too familiar with how the command hierarchy works - how does this "sibling" command end up causing the nesting discussed on the dev list? It looks like the hierarchy is:
NoSqlMaintenanceRunCommand extends BaseNoSqlMaintenanceCommand
BaseNoSqlMaintenanceCommand extends BaseNoSqlCommand
BaseNoSqlCommand extends BaseCommand
While
NoSqlCommand extends BaseNoSqlCommand
Thus appears to be a "sibling" of BaseNoSqlMaintenanceCommand rather than being a "parent".
Some documentation on the expected full example commands would also be helpful to reason about this; could we add some to the PR description and/or readme/md docs?
There was a problem hiding this comment.
Reading through the code to answer my own question :)
I guess the subcommands tuple basically "intercepts" the other command classes rather than requiring those other command classes to know anything about being nested under a parent command?
There was a problem hiding this comment.
AFAIK, command nesting is derived from annotations. Implementation classes are free to extend other classes just to share code.
There was a problem hiding this comment.
@dimas-b is right.
Java type hierarchy is orthogonal to picocli commands / annotations.
|
@dennishuo : Here's the issue for CEL follow-up: #3847 Re: Admin CLI docs, would you mind if they were added after this PR? The latest docs pages are still useful even if they do not make it to the same release as the binaries. |
|
@dennishuo : I updated the PR discription... Is it better? :) |
|
Added some docs for the web site. |
| Alternatively, you can run maintenance at high or the default unlimited scan rate during off-peak hours. | ||
|
|
||
| {{< alert note >}} | ||
| The `nosql maintenance-run` command may emit a warning that not all purgable database entried have been deleted. |
dennishuo
left a comment
There was a problem hiding this comment.
Thanks for the doc updates! LGTM
This change wires the maintenance service up to the Polaris admin tool.
The new commands are grouped under the
nosqlparent command. This should provide enough isolation for end users who do not use NoSQL Persistence.The initial NoSQL maintenance replies on CEL expressions for configuration. This will be reworked in follow-up PRs under #3847.
The Admin Tool remains a collection of utilities for low-level manipulation of the Polaris database, such as bootstrapping and maintenance. It targets admin users, so ordinary API / Catalog users are not expected to be affected by introducing NoSQL commands into the Admin CLI. Administrators are expected to be familiar with the multitude of Persistence backends and should not be confused by the new NoSQL-specific commands.
Another area for follow-up (related to modularizing the Admin CLI for better dependency management) is covered by #3855.