Skip to content

Conversation

@TomerShor
Copy link
Member

@TomerShor TomerShor commented Jan 1, 2026

📝 Description

This PR addresses a few issues that were found in the project creation flow when MLRun is the project Leader:

  1. Ensures that project permissions are properly established after project creation and storage operations, by adding a retry mechanism to wait for permission propagation before returning from the API endpoints, preventing race conditions where clients may immediately try to access the newly created project before permissions are fully available.
    1. Note that this doesn't ensure both chief and worker have the updated copies, but it is a best effort and the delay margins between the two should be small enough.
  2. When running operations on all followers (such as create/store project), run them on the sorted list of followers. When the followers are defined ['igz', 'nuclio'] this ensures the project policies are created before the project is created on Nuclio.
    1. This is quite hacky, I agree, but until we provide a more robust project leader mechanism it will have to do.
  3. Refactor iguazio v4's store_project to decide if it a "create" or "update" based on if create_project raises a 409 Conflict error. If conflict - it is an "update", if not - it is a "create".
    1. This was needed because in the old implementation which used get_project_policy_assignments, we would get a 403 and not a 404, so we couldn't really tell if that 403 is because the project doesn't exist or we really don't have permissions.

🛠️ Changes Made

  • Added ensure_project_permissions() method to AuthVerifier (server/py/framework/utils/auth/verifier.py):

    • New async method that retries project read permission checks with a 1-second backoff and 10-second timeout
    • Handles race conditions where the auth provider may not immediately have permissions available after project creation
  • Updated project endpoints (server/py/services/api/api/endpoints/projects.py):

    • Added ensure_project_permissions() call after create_project completes (before returning 201)
    • Added ensure_project_permissions() call after store_project completes
  • Refactored iguazio's store_project (server/py/framework/utils/clients/iguazio/v4.py):

    • Resolving create or updated according to a 409 Conflict status
  • Sorted followers list when running all on all followers (server/py/framework/utils/projects/leader.py):

    • Minimal effort to ensure igz follower operations run before nuclio follwer.

✅ Checklist

  • I updated the documentation (if applicable)
  • I have tested the changes in this PR
  • I confirmed whether my changes are covered by system tests
    • If yes, I ran all relevant system tests and ensured they passed before submitting this PR
    • I updated existing system tests and/or added new ones if needed to cover my changes
  • If I introduced a deprecation:

🧪 Testing

  • Manual testing to verify permissions are available immediately after project creation
  • Verified retry mechanism properly waits for permission propagation
  • Tested both in IG3 and IG4

🔗 References


🚨 Breaking Changes?

  • Yes (explain below)
  • No

🔍️ Additional Notes

Copy link
Member

@liranbg liranbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚀

Copy link
Contributor

@elbamit elbamit left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 questions:

  • Why do you check read permissions? How does that solve if the user performs not a read operation on the project resources?
  • Is this change only relevant for project creation/storing operations? What about other resources/operations?

@TomerShor
Copy link
Member Author

@elbamit

  • Why do you check read permissions? How does that solve if the user performs not a read operation on the project resources?

It's a minimal check only to verify that OPA has received the changes in the manifest.
A project owner that just created the project will have certainly have read permissions - if OPA doesn't have the project policies, the read check fails.

  • Is this change only relevant for project creation/storing operations? What about other resources/operations?

It's a must for create / store, to allow accessing the project at all.
Deleting - even if there is a delay of a few seconds maximum I think that's fine.
Updating / Patching - the only change that can be relevant for policies is updating an owner, but since the user doing the update is most definitely not the new owner (it's the old owner) - we cannot ensure it via opa really, and it is not a critical flow.

@TomerShor TomerShor changed the title [Projects] Ensure project permissions are applied on creation/storing [Projects] Fix project creation flow Jan 1, 2026
@TomerShor TomerShor changed the title [Projects] Fix project creation flow [Projects] Fix leader project creation flow Jan 2, 2026
@TomerShor TomerShor merged commit 9e69221 into mlrun:development Jan 4, 2026
18 of 24 checks passed
@TomerShor TomerShor deleted the ensure-project-permissions branch January 4, 2026 06:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants