Skip to content

Comments

Improve validation of remote execution support#1898

Merged
feluelle merged 4 commits intomainfrom
refactor/remote-execution-support
Jul 22, 2025
Merged

Improve validation of remote execution support#1898
feluelle merged 4 commits intomainfrom
refactor/remote-execution-support

Conversation

@feluelle
Copy link
Member

@feluelle feluelle commented Jul 18, 2025

Description

Describe the purpose of this pull request.

Currently, we allow passing remote execution fields on both standard and dedicated deployments. However, it is only truly supported at the moment for dedicated. Now, we always set the remote execution spec for hosted deployments, which means that if the user sets it on a standard deployment request, the api will return an error describing what's currently supported, vs silently not setting it.

🎟 Issue(s)

Related #1895

🧪 Functional Testing

List the functional testing steps to confirm this feature or fix.

❯ ./astro deployment create --remote-execution-enabled
Current Workspace: felix

Please specify a name for your Deployment

Deployment name: test

Please select a Region for your Deployment:
 #     REGION            
 1     eastus2           
 2     westus2           
 3     australiaeast     
 4     westeurope        

> 1
Error: Invalid field values: field remoteExecution failed check on excluded_unless; remoteExecution is excluded unless Executor is 'ASTRO';,field workerQueues failed check on workerQueuesRequiredForCelery;
❯ ./astro deployment create --remote-execution-enabled --executor ASTRO
Current Workspace: felix

Please specify a name for your Deployment

Deployment name: test

Please select a Region for your Deployment:
 #     REGION            
 1     westeurope        
 2     westus2           
 3     australiaeast     
 4     eastus2           

> 2
Error: Invalid request: Remote execution is only supported for deployments on dedicated clusters
❯ ./astro deployment create --remote-execution-enabled --executor ASTRO --type dedicated --cluster-id cm4sauxuo03ig01rfux2aun5d --resource-quota-cpu 2
Current Workspace: felix

Please specify a name for your Deployment

Deployment name: testss
Error: Invalid field values: field resourceQuotaCpu failed check on excluded_if; resourceQuotaCpu is excluded if RemoteExecution.Enabled is 'true';
❯ ./astro deployment create --remote-execution-enabled --executor ASTRO --type dedicated --cluster-id cm4sauxuo03ig01rfux2aun5d --resource-quota-memory 2
Current Workspace: felix

Please specify a name for your Deployment

Deployment name: asd
Error: Invalid field values: field resourceQuotaMemory failed check on excluded_if; resourceQuotaMemory is excluded if RemoteExecution.Enabled is 'true';
  • Create Celery deployment with default worker queues
  • Create Astro deployment with default worker queues
  • Create Astro Remote deployment from file

📸 Screenshots

Add screenshots to illustrate the validity of these changes.

📋 Checklist

  • Rebased from the main (or release if patching) branch (before testing)
  • Ran make test before taking out of draft
  • Ran make lint before taking out of draft
  • Added/updated applicable tests
  • Tested against Astro-API (if necessary).
  • Tested against Houston-API and Astronomer (if necessary).
  • Communicated to/tagged owners of respective clients potentially impacted by these changes.
  • Updated any related documentation

Currently, we allow passing remote execution fields on both standard and dedicated deployments. However, it is only truly supported at the moment for dedicated. Now, we always set the remote execution spec for hosted deployments, which means that if the user sets it on a standard deployment request, the api will return an error describing what's currently supported, vs silently not setting it.
@feluelle feluelle force-pushed the refactor/remote-execution-support branch from 3a0256c to ccf6626 Compare July 18, 2025 17:22
@feluelle feluelle self-assigned this Jul 21, 2025
@feluelle feluelle requested a review from a team July 21, 2025 10:45
Copy link
Contributor

@neel-astro neel-astro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some minor comments, but otherwise LGTM

}
}
switch schedulerSize {
switch schedulerSize { //nolint:dupl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be a utility method instead of using nolint:dupl?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not quite the same. There are a lot of dedicated vs standard references.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I think there is scope to improve still, generics could be one way to look at it. But not a blocker for this PR, as I already have a story incoming from the other nova to make these insane if/else blocks a bit simpler.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, 100%, but I don't want to increase the scope of this further. There's too much to clean up.

standardDeploymentRequest.SchedulerSize = astroplatformcore.CreateStandardDeploymentRequestSchedulerSizeEXTRALARGE
case "":
standardDeploymentRequest.SchedulerSize = astroplatformcore.CreateStandardDeploymentRequestSchedulerSize(configOption.DefaultValues.SchedulerSize)
if standardDeploymentRequest.RemoteExecution != nil && standardDeploymentRequest.RemoteExecution.Enabled {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Standard deployment should have remote execution as nil right? or am I missing something?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, theoretically, but the API spec doesn't tell us that. I discussed that with @ianbuss, and we think having less business logic (which it is if it is not part of the OpenAPI spec) is better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is better to surface the raw API error rather than a possible outdated or inaccurate error message.

}
}

func CreateDefaultTaskPodCPU(defaultTaskPodCPU string, isRemoteExecutionEnabled bool, configOption *astrocore.DeploymentOptions) *string {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should we add unit tests for these new functions? Or are we sure that the existing test cases cover all possible variations?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have manually tested some of them, but UTs make sense to me actually, because they are public and shared between deployment.go and from_file.go. I will add some. Thank you!

@feluelle feluelle requested a review from neel-astro July 21, 2025 17:35
}
}
switch schedulerSize {
switch schedulerSize { //nolint:dupl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: if you look at it, the case should be on the dedicated deployment request scheduler size small, and so on for the rest, i.e., dedicated instead of standard. Tiny thing in the existing codebase, but could lead to buggy code someday.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah, the refs are incorrect. Let's handle that on a different PR.

}
}
switch schedulerSize {
switch schedulerSize { //nolint:dupl
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, I think there is scope to improve still, generics could be one way to look at it. But not a blocker for this PR, as I already have a story incoming from the other nova to make these insane if/else blocks a bit simpler.

@feluelle feluelle merged commit 406ce70 into main Jul 22, 2025
4 of 5 checks passed
@feluelle feluelle deleted the refactor/remote-execution-support branch July 22, 2025 11:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants