Skip to content

Commit 9e69221

Browse files
authored
[Projects] Fix leader project creation flow (#9146)
### 📝 Description This PR addresses a few issues that were found in the project creation flow when MLRun is the project Leader: 1. Ensures that project permissions are properly established after project creation and storage operations, by adding a retry mechanism to wait for permission propagation before returning from the API endpoints, preventing race conditions where clients may immediately try to access the newly created project before permissions are fully available. 1. Note that this doesn't ensure both chief and worker have the updated copies, but it is a best effort and the delay margins between the two should be small enough. 2. When running operations on all followers (such as create/store project), run them on the `sorted` list of followers. When the followers are defined `['igz', 'nuclio']` this ensures the project policies are created before the project is created on Nuclio. 1. This is quite hacky, I agree, but until we provide a more robust project leader mechanism it will have to do. 3. Refactor iguazio v4's `store_project` to decide if it a "create" or "update" based on if `create_project` raises a 409 Conflict error. If conflict - it is an "update", if not - it is a "create". 1. This was needed because in the old implementation which used `get_project_policy_assignments`, we would get a 403 and not a 404, so we couldn't really tell if that 403 is because the project doesn't exist or we really don't have permissions. --- ### 🛠️ Changes Made - **Added `ensure_project_permissions()` method to `AuthVerifier`** (`server/py/framework/utils/auth/verifier.py`): - New async method that retries project read permission checks with a 1-second backoff and 10-second timeout - Handles race conditions where the auth provider may not immediately have permissions available after project creation - **Updated project endpoints** (`server/py/services/api/api/endpoints/projects.py`): - Added `ensure_project_permissions()` call after `create_project` completes (before returning 201) - Added `ensure_project_permissions()` call after `store_project` completes - **Refactored iguazio's `store_project`** (`server/py/framework/utils/clients/iguazio/v4.py`): - Resolving create or updated according to a 409 Conflict status - **Sorted followers list when running all on all followers** (`server/py/framework/utils/projects/leader.py`): - Minimal effort to ensure `igz` follower operations run before `nuclio` follwer. --- ### ✅ Checklist - [ ] I updated the documentation (if applicable) - [x] I have tested the changes in this PR - [ ] I confirmed whether my changes are covered by system tests - [ ] If yes, I ran all relevant system tests and ensured they passed before submitting this PR - [ ] I updated existing system tests and/or added new ones if needed to cover my changes - [ ] If I introduced a deprecation: - [ ] I followed the [Deprecation Guidelines](./DEPRECATION.md) - [ ] I updated the relevant Jira ticket for documentation --- ### 🧪 Testing - Manual testing to verify permissions are available immediately after project creation - Verified retry mechanism properly waits for permission propagation - Tested both in IG3 and IG4 --- ### 🔗 References - Ticket link: https://iguazio.atlassian.net/browse/IG4-1002, https://iguazio.atlassian.net/browse/IG4-1044 - Design docs links: - External links: --- ### 🚨 Breaking Changes? - [ ] Yes (explain below) - [x] No --- ### 🔍️ Additional Notes
1 parent 1fa3a07 commit 9e69221

File tree

5 files changed

+69
-13
lines changed

5 files changed

+69
-13
lines changed

server/py/framework/utils/auth/verifier.py

Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,8 +20,10 @@
2020

2121
import mlrun
2222
import mlrun.common.schemas as schemas
23+
import mlrun.utils.helpers
2324
import mlrun.utils.singleton
2425
from mlrun.common.types import AuthenticationMode
26+
from mlrun.utils import logger
2527

2628
import framework.utils.auth.providers.nop
2729
import framework.utils.auth.providers.opa
@@ -301,6 +303,30 @@ def _generate_resource_string_from_project_name(
301303
resource_namespace,
302304
)
303305

306+
async def ensure_project_permissions(
307+
self,
308+
project_name: str,
309+
auth_info: mlrun.common.schemas.AuthInfo,
310+
):
311+
"""
312+
Ensures project permissions are populated in the AuthVerifier
313+
"""
314+
315+
async def _check_project_read_permissions():
316+
await self.query_project_permissions(
317+
project_name,
318+
mlrun.common.schemas.AuthorizationAction.read,
319+
auth_info,
320+
)
321+
322+
await mlrun.utils.helpers.retry_until_successful_async(
323+
backoff=1,
324+
timeout=10,
325+
logger=logger,
326+
verbose=False,
327+
_function=_check_project_read_permissions,
328+
)
329+
304330
def _generate_resource_string_from_project_resource(
305331
self,
306332
resource_type: schemas.AuthorizationResourceTypes,

server/py/framework/utils/clients/iguazio/v4.py

Lines changed: 21 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -187,10 +187,27 @@ def store_project(
187187

188188
def _update_owner_or_create_policies():
189189
self._client.set_override_auth_headers(auth_info.request_headers)
190-
if self._project_policies_exist(project.metadata.name, auth_info):
191-
self.patch_project(session, name, project.dict(), auth_info=auth_info)
192-
else:
193-
self.create_project(session, project, auth_info=auth_info)
190+
try:
191+
# Try to create policies first
192+
self._client.create_default_project_policies(
193+
project=project.metadata.name
194+
)
195+
self._logger.info(
196+
"Successfully created default project policies in Iguazio"
197+
)
198+
except httpx.HTTPStatusError as exc:
199+
# If policies already exist (409 Conflict), update owner instead
200+
if exc.response.status_code == httpx.codes.CONFLICT:
201+
self._logger.debug(
202+
"Project policies already exist, updating owner instead",
203+
project=project.metadata.name,
204+
)
205+
self.patch_project(
206+
session, name, project.dict(), auth_info=auth_info
207+
)
208+
else:
209+
# Unexpected error, re-raise
210+
raise
194211

195212
self._try_callback_with_httpx_exceptions(
196213
_update_owner_or_create_policies,

server/py/framework/utils/projects/leader.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -397,7 +397,7 @@ def _run_on_all_followers(
397397
# TODO: do it concurrently
398398
follower_responses = {
399399
follower_name: getattr(follower, method)(*args, **kwargs)
400-
for follower_name, follower in self._followers.items()
400+
for follower_name, follower in sorted(self._followers.items())
401401
}
402402
if not leader_first:
403403
leader_response = getattr(self._leader_follower, method)(*args, **kwargs)

server/py/services/api/api/endpoints/projects.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -73,6 +73,12 @@ async def create_project(
7373
)
7474
if is_running_in_background:
7575
return fastapi.Response(status_code=http.HTTPStatus.ACCEPTED.value)
76+
77+
await framework.utils.auth.verifier.AuthVerifier().ensure_project_permissions(
78+
project.metadata.name,
79+
auth_info,
80+
)
81+
7682
response.status_code = http.HTTPStatus.CREATED.value
7783
return project
7884

@@ -107,6 +113,12 @@ async def store_project(
107113
)
108114
if is_running_in_background:
109115
return fastapi.Response(status_code=http.HTTPStatus.ACCEPTED.value)
116+
117+
await framework.utils.auth.verifier.AuthVerifier().ensure_project_permissions(
118+
project.metadata.name,
119+
auth_info,
120+
)
121+
110122
return project
111123

112124

server/py/services/api/tests/unit/utils/clients/iguazio/test_iguazio_v4.py

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -486,11 +486,12 @@ def test_store_project(
486486
):
487487
project = _generate_igv4_project(owner=owner)
488488

489-
if not project_exists:
490-
iguazio_client._client.get_project_policy_assignments.side_effect = (
489+
if project_exists:
490+
# Simulate 409 Conflict when trying to create project policies that already exist
491+
iguazio_client._client.create_default_project_policies.side_effect = (
491492
_generate_igv4_httpx_exception(
492-
"Project not found",
493-
httpx.codes.NOT_FOUND,
493+
"Project policies already exist",
494+
httpx.codes.CONFLICT,
494495
)
495496
)
496497

@@ -501,16 +502,16 @@ def test_store_project(
501502
igv4_auth_info.request_headers
502503
)
503504

504-
iguazio_client._client.get_project_policy_assignments.assert_called_once_with(
505+
# create_project is always called first (which calls create_default_project_policies)
506+
iguazio_client._client.create_default_project_policies.assert_called_once_with(
505507
project=TEST_PROJECT_NAME
506508
)
507509

508510
if not project_exists:
509-
iguazio_client._client.create_default_project_policies.assert_called_once_with(
510-
project=TEST_PROJECT_NAME
511-
)
511+
# New project: create succeeded, no need to update owner
512512
iguazio_client._client.update_project_owner.assert_not_called()
513513
else:
514+
# Existing project: 409 Conflict triggered patch_project
514515
if not owner:
515516
iguazio_client._client.update_project_owner.assert_not_called()
516517
else:

0 commit comments

Comments
 (0)