Skip to content

Conversation

@Bihan
Copy link
Collaborator

@Bihan Bihan commented Jul 30, 2025

Notes:

  1. Only two instance types are available in Hotaisle with CPUs: 8-core and 13-core. Other fields are not configurable.
  2. Custom HotAisleProvider is added in HotasileCompute because Hotaisle requires api_key and team_handle
  3. Issue: Like Lambda's Issue, instance becomes ureachable after dstack server restart

@Bihan
Copy link
Collaborator Author

Bihan commented Jul 30, 2025

How to test

1. Add hotaisle creds

type: hotaisle
    team_handle: <TEAM_HANDLE>
    creds:
      type: api_key
      api_key: <API_KEY>

2. GPUHunt
Use https://github.com/Bihan/gpuhunt/tree/add_hotaisle_vm

3. Serve Qwen

type: service
name: qwen-vllm-amd

image: rocm/vllm:latest
env:
  - MODEL_ID=Qwen/Qwen2.5-7B-Instruct
  - MAX_MODEL_LEN=4096

commands:
  - vllm serve $MODEL_ID --max-model-len $MAX_MODEL_LEN

port: 8000
model: Qwen/Qwen2.5-7B-Instruct

resources:
  gpu: MI300X:1

@Bihan Bihan requested a review from peterschmidt85 July 30, 2025 10:42
@peterschmidt85 peterschmidt85 requested a review from jvstme July 30, 2025 11:46
Copy link
Collaborator

@jvstme jvstme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Bihan, sorry for the delayed review. The PR looks good overall, but please see my suggestions about some details

Comment on lines 28 to 43
def _validate_user_and_team(self) -> None:
url = f"{API_URL}/user/"
response = self._make_request("GET", url)

if response.ok:
user_data = response.json()
else:
response.raise_for_status()

teams = user_data.get("teams", [])
if not teams:
raise ValueError("No Hotaisle teams found for this user")

available_teams = [team["handle"] for team in teams]
if self.team_handle not in available_teams:
raise ValueError(f"Hotaisle Team '{self.team_handle}' not found.")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(optional) This validation is already better than in most our backends, but we can further improve it by validating the roles assigned to the key, so that users can see permission-related errors earlier - when configuring the backend rather than when creating instances.

It should be possible to validate everything (the key, the user role, and the team roles) by calling only GET /user/api_keys/{prefix}/

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. Will plan to update it in the next iteration.

Copy link
Collaborator

@jvstme jvstme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Bihan, thank you. I've left some nit-picking comments about the latest changes, but feel free to address them in a separate PR or ignore them, they are not that important. I think the PR is good to merge now


class HotAisleInstanceBackendData(CoreModel):
ip_address: str
vm_id: Optional[str] = None
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) This field is unused, so I wouldn't include it in the model

Comment on lines +33 to +43
except ValueError as e:
error_message = str(e)
if "No Hot Aisle teams found" in error_message:
raise_invalid_credentials_error(
fields=[["creds", "api_key"]],
details="Valid API key but no teams found for this user",
)
elif "not found" in error_message:
raise_invalid_credentials_error(
fields=[["team_handle"]], details=f"Team handle '{self.team_handle}' not found"
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) Looking for patterns in our own error messages and then raising with another error message looks quite redundant. It's also error-prone, because we can change the error message in _validate_user_and_team and forget to change it here.

Some alternatives I can suggest:

  • Raise with the same error message - raise_invalid_credentials_error(details=str(e), ...)
  • Call raise_invalid_credentials_error directly in _validate_user_and_team
  • (my favorite) Merge validate_api_key and _validate_user_and_team into one method and call raise_invalid_credentials_error directly


logger = get_logger(__name__)

MAX_INSTANCE_NAME_LEN = 60
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) Unused

@Bihan Bihan merged commit 7cb3124 into dstackai:master Aug 7, 2025
25 checks passed
@Bihan Bihan mentioned this pull request Aug 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants