Skip to content

Comments

feat: initial Operator::from_uri implementation#5482

Draft
jorgehermo9 wants to merge 38 commits intoapache:mainfrom
jorgehermo9:feature/operator-from-uri
Draft

feat: initial Operator::from_uri implementation#5482
jorgehermo9 wants to merge 38 commits intoapache:mainfrom
jorgehermo9:feature/operator-from-uri

Conversation

@jorgehermo9
Copy link
Contributor

@jorgehermo9 jorgehermo9 commented Dec 30, 2024

Relates to #5445

Left some doubts as // TODO in the source. I have little experience contributing to this repo so I'm sorry if there are a lot of doubts about this. Just want to be sure all the changes of this PR aligns with the current codebase. Please take a look to all the TODOs I left when reviewing.

I would like to add more tests, but I don't know in which place those should be placed. The core/tests folder seems like a good place, but I don't find any place suitable, as placing those in core/tests/behaviour seems weird to me. But as this implies various components, maybe we can have a core/tests/integration? Although I would like to write some unit tests at core/src/types/builder.rs, core/src/types/operator/builder.rs and core/src/types/operator/registry.rs, but didn't any existing unit tests there.

In this PR I implemented a single Configurator::from_uri method, which will serve as default and takes only the query parameters as options. Services which need a more specific configuration such as s3 or azblob can be implemented in follow-up PRs.

I also have pending documentating all the newly added public API, but will do that after an initial review round.

Thank you very much.

@jorgehermo9 jorgehermo9 requested a review from Xuanwo as a code owner December 30, 2024 19:41
@github-actions github-actions bot added the releases-note/feat The PR implements a new feature or has a title that begins with "feat" label Dec 30, 2024
@@ -0,0 +1,226 @@
use std::cell::LazyCell;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this file inside the crate::types::operator::registry module. Is it ok? I thought about adding a crate::types::operator_registry, but it seemed better this way.


// TODO: thread local or use LazyLock instead? this way the access is lock-free
// TODO: should we expose the `GLOBAL_OPERATOR_REGISTRY` as public API at `crate::types::operator::GLOBAL_OPERATOR_REGISTRY`?
thread_local! {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the preferred way of having a global static variable such as this?

I prefer to have it thread_local so there is not need for a LazyLock, we can use LazyCell instead (LazyCell is lock-free but LazyLock is not, it synchronizes access through threads)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just curious, why do we need static var here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In order to have a global initialized registry and to not have to initialize it every time whenever we want to parse a uri

GLOBAL_OPERATOR_REGISTRY.with(|registry| registry.parse(uri, options))

// TODO: thread local or use LazyLock instead? this way the access is lock-free
// TODO: should we expose the `GLOBAL_OPERATOR_REGISTRY` as public API at `crate::types::operator::GLOBAL_OPERATOR_REGISTRY`?
thread_local! {
pub static GLOBAL_OPERATOR_REGISTRY: LazyCell<OperatorRegistry> = LazyCell::new(|| OperatorRegistry::with_enabled_services());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MSRV check fails due to the usage of LazyCell. Should we update the MSRV or use another thing instead?

I see this TODO about the once_cell crate usage:

# TODO: remove once_cell when lazy_lock is stable: https://doc.rust-lang.org/std/cell/struct.LazyCell.html

If the replacement is planned, I think it would be better to use LazyCell than once_cell::Lazy in new code like this one, to not increase technical debt.

image

@asukaminato0721
Copy link
Contributor

asukaminato0721 commented May 17, 2025

so many "todo"...

keep the code change small, clippy happy, test pass, and the code style unified (you can use llm for this kind of idea)

then you can answer many todos by yourself. :)

@jorgehermo9
Copy link
Contributor Author

Thank you very much for your review @asukaminato0721! I usually leave the TODOs in code as comments to the reviewer, will address everything asap!

fn from_uri(uri: &Uri, options: impl IntoIterator<Item = (String, String)>) -> Result<Self> {
let query_pairs = uri.query().map(query_pairs).unwrap_or_default();

let merged_options = query_pairs.into_iter().chain(options);
Copy link
Contributor Author

@jorgehermo9 jorgehermo9 May 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unsure about
let merged_options = query_pairs.into_iter().chain(options);
or
let merged_options = options.chain(query_pairs.into_iter());

Which one should take precedence? query_pairs should overwrite the options, or should the options overwrite the query_pairs? I'm thinking on the case that query_pairs and options contain the same key

Right now, I think that the behaviour with query_pairs.into_iter().chain(options) is that options takes precedence and them overwrite whatever goes in query_pairs if a key is present in both

Comment on lines +37 to +39
pub use registry::OperatorFactory;
pub use registry::OperatorRegistry;
pub use registry::GLOBAL_OPERATOR_REGISTRY;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we expose these 3 as public api @Xuanwo? or should we pub(crate) them instead?

Or should we only expose OperatorFactory and OperatorRegistry but not GLOBAL_OPERATOR_REGISTRY?

@jorgehermo9
Copy link
Contributor Author

Hi @Xuanwo, I worked a bit more on this and I think it is ready for another review round. Once the implementation looks good to you, I will work on tests and documentation, is this okay?

@asukaminato0721 I think I addressed most of your comments

}

/// TODO: document this.
fn from_uri(uri: &Uri, options: impl IntoIterator<Item = (String, String)>) -> Result<Self> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it okay to accept &Uri here? This way, we don't have to parse the uri again (it is only done here right now)

Or should we change it to &str and parse the uri twice?

@Xuanwo
Copy link
Member

Xuanwo commented Jun 4, 2025

Sorry, @jorgehermo9, for keeping you waiting so long. I plan to start working on this soon.

The main issue preventing us from moving forward smoothly is that we haven’t yet decided how to split the crates and design our registry.

I think we can break this feature into two parts: one for splitting opendal, which will divide opendal into opendal, opendal-core, opendal-services-xxx, and opendal-layers-xxx as we discussed in #5206. In this phase, we’ll figure out how to split opendal and design a registry mechanism that is easy to use and works well for opendal after the split. Based on that, we can design a from_uri API that can reuse our existing framework. At that point, adding the from_uri API should be simple and won’t introduce any breaking changes.

I led you down the wrong path because I used to think it was fine to add from_uri first and then perform the splits. However, during the review of this PR, I finally realized that I had misunderstood these two processes. In fact, we can't have a proper from_uri unless we know how the opendal split and opendal registry will look.


I think we can keep this PR as a draft, and I'll invite you to join the discussion about the upcoming OpenDAL split RFC. I believe your experience in implementing this PR could be really valuable.

@jorgehermo9
Copy link
Contributor Author

No problem at all @Xuanwo, I completely understand that. Thank you very much for your review and thoughts, and I'll be happy to discuss it on the RFC. I'm leaving the PR as a draft for future reference until the RFC is done, but feel free to close it if you want.

@jorgehermo9 jorgehermo9 marked this pull request as draft June 6, 2025 06:49
@Xuanwo
Copy link
Member

Xuanwo commented Jun 6, 2025

No problem at all @Xuanwo, I completely understand that. Thank you very much for your review and thoughts, and I'll be happy to discuss it on the RFC.

Thank you very much for your understanding!

. I'm leaving the PR as a draft for future reference until the RFC is done, but feel free to close it if you want.

Yes, let's keep this open as a draft.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

releases-note/feat The PR implements a new feature or has a title that begins with "feat" size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants