Skip to content
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: dstackai/dstack
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: 0.19.30
Choose a base ref
...
head repository: dstackai/dstack
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: 0.19.31
Choose a head ref
  • 12 commits
  • 56 files changed
  • 4 contributors

Commits on Sep 25, 2025

  1. Configuration menu
    Copy the full SHA
    d622241 View commit details
    Browse the repository at this point in the history

Commits on Sep 26, 2025

  1. Kubernetes: request resources according to RequirementsSpec (#3127)

    Other fixes and improvements:
    
    * Handle errors in `_create_jump_pod_service_if_not_exists`
    * Check both Service and Pod to decide if the jump pod
      must be (re)created
    * Respect `Node.status.nodeinfo.architecture`
    * Add `namespace` option to the backend config
    
    Part-of: #3126
    un-def authored Sep 26, 2025
    Configuration menu
    Copy the full SHA
    430552b View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    ef698be View commit details
    Browse the repository at this point in the history

Commits on Sep 29, 2025

  1. Support A4 instances with the B200 GPU on GCP (#3100)

    This implementation allows provisioning both
    individual A4 instances and clusters, but clusters
    do not yet support high-speed networking, since it
    requires a
    [different network setup](https://cloud.google.com/ai-hypercomputer/docs/create/create-vm#setup-network).
    jvstme authored Sep 29, 2025
    Configuration menu
    Copy the full SHA
    9c51df8 View commit details
    Browse the repository at this point in the history
  2. [Docs] Minor fixes

    peterschmidt85 committed Sep 29, 2025
    Configuration menu
    Copy the full SHA
    840ce36 View commit details
    Browse the repository at this point in the history
  3. Move USER to dstack project list --verbose (#3134)

    Only show the `USER` column in
    `dstack project list` if `--verbose` is passed.
    In my setup, where 9 projects are configured, this
    speeds up `dstack project list` from 20 seconds to
    2 seconds.
    jvstme authored Sep 29, 2025
    Configuration menu
    Copy the full SHA
    f90259b View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    daa3d03 View commit details
    Browse the repository at this point in the history

Commits on Sep 30, 2025

  1. [Backward incompatible] Rename properties in Kubernetes backend config (

    #3137)
    
    * networking -> proxy_jump
    * ssh_host -> hostname
    * ssh_port -> port
    
    In addition, `dstack-` prefix has been added to jump pod and service
    names for consistency with jobs pods and services.
    
    Closes: #3136
    Co-authored-by: peterschmidt85 <[email protected]>
    un-def and peterschmidt85 authored Sep 30, 2025
    Configuration menu
    Copy the full SHA
    85faee6 View commit details
    Browse the repository at this point in the history

Commits on Oct 2, 2025

  1. Support GCP A4 clusters (#3142)

    This commit implements provisioning GCP A4
    clusters with high-performance RoCE networking.
    
    ```shell
    > dstack fleet
     FLEET  INSTANCE  BACKEND         RESOURCES                                          PRICE    STATUS  CREATED
     gpu    0         gcp (us-west2)  cpu=224 mem=3968GB disk=100GB B200:180GB:8 (spot)  $51.552  idle    21 mins ago
            1         gcp (us-west2)  cpu=224 mem=3968GB disk=100GB B200:180GB:8 (spot)  $51.552  idle    17 mins ago
    ```
    
    To enable high-performance networking, users need
    to create the
    [appropriate networks](https://cloud.google.com/ai-hypercomputer/docs/create/create-vm#setup-network)
    and configure them in the backend settings.
    
    ```yaml
    projects:
    - name: main
      backends:
      - type: gcp
        project_id: my-project
        creds:
          type: default
        vpc_name: my-vpc-0  # regular, 1 subnet
        extra_vpcs:
        - my-vpc-1  # regular, 1 subnet
        roce_vpcs:
        - my-vpc-mrdma  # RoCE profile, 8 subnets
    ```
    
    Then apply a fleet configuration.
    
    ```yaml
    type: fleet
    nodes: 2
    placement: cluster
    availability_zones: [us-west2-c]
    backends: [gcp]
    resources:
      gpu: 8:b200
    ```
    
    Each instance in the cluster will then have 10
    network interfaces:
    - 1 regular interface in the main VPC (`default`
      or the one configured in `vpc_name`).
    - 1 regular interface in a VPC configured in
      `extra_vpcs`.
    - 8 RDMA interfaces in the VPC configured in
      `roce_vpcs`.
    
    Additionally, this commit optimizes the fetching
    and caching of subnets, so that they are fetched
    from the API only once, and not separately for
    each item in `extra_vpcs`. For some instance
    types, this reduces the number of API requests
    from 9 to 1, which cuts about 16 seconds from each
    offer provisioning attempt.
    jvstme authored Oct 2, 2025
    Configuration menu
    Copy the full SHA
    f7ef485 View commit details
    Browse the repository at this point in the history
  2. Kubernetes: add multi-node support (#3141)

    * Discover and set instance's internal_ip (PodIP)
    * Fix region mismatch
    * Add `privileged: true` support
    * [runner] Set RLIMIT_MEMLOCK to unlimited. Fixes issues with
      InfiniBand/RDMA
    
    Part-of: #3126
    un-def authored Oct 2, 2025
    Configuration menu
    Copy the full SHA
    8a72c8c View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    3dbd68b View commit details
    Browse the repository at this point in the history
  4. [Docs] Improve Kubernetes documentation (#3138)

    * [Docs] Kubernetes guide
    
    * [Docs] Kubernetes guide
    
    Rework `Backends` and `Fleets` pages to reflect the changes related to Kubernetes
    
    * [Docs] Improve Kubernetes documentation
    
    Updated `README`, `Overview`, `Installation`
    
    * [Docs] Improve Kubernetes documentation
    
    Minor updates, incl. the description of `Default image`, and `privileged` for NCCL tests
    
    * [Docs] Improve Kubernetes documentation
    
    Updated `FAQ`
    peterschmidt85 authored Oct 2, 2025
    Configuration menu
    Copy the full SHA
    6201c2f View commit details
    Browse the repository at this point in the history
Loading