Skip to content

RFC: Dynamic Disks Support in BOSH#1401

Merged
beyhan merged 1 commit intocloudfoundry:mainfrom
mariash:dynamic-disks
Mar 24, 2026
Merged

RFC: Dynamic Disks Support in BOSH#1401
beyhan merged 1 commit intocloudfoundry:mainfrom
mariash:dynamic-disks

Conversation

@mariash
Copy link
Copy Markdown
Member

@mariash mariash commented Jan 8, 2026

This PR adds the RFC "Dynamic Disks Support in BOSH".

For easier viewing, you can see the full RFC as preview.

@mariash mariash force-pushed the dynamic-disks branch 2 times, most recently from edb5daa to 83fa9cd Compare January 8, 2026 18:26
@beyhan beyhan requested review from a team, Gerg, beyhan, cweibel, rkoster and stephanme and removed request for a team January 9, 2026 10:18
@beyhan beyhan added rfc CFF community RFC toc labels Jan 9, 2026
@rkoster
Copy link
Copy Markdown
Contributor

rkoster commented Jan 9, 2026

Architectural Concerns: Runtime Dependencies, Security Model, and Adoption Path

This RFC proposes significant shifts in BOSH's operational and security model that need to be addressed before proceeding with API design.


1. BOSH Director in the Runtime Critical Path

Current State:
The BOSH Director functions purely as a control plane. Once workloads are deployed, the Director can be completely unavailable without impacting running applications. Cloud Foundry, Kubernetes clusters, and other BOSH-deployed systems continue operating normally during Director downtime.

Proposed State:
This RFC introduces runtime dependencies on the Director. Workloads that need to dynamically provision or attach disks will fail if the Director is unavailable. This expands the blast radius of Director outages from "can't deploy/scale" to "application-level failures."

Blocker: There is no open-source high-availability solution for BOSH Director. The Director is typically deployed as a single instance and is not designed for the uptime guarantees that runtime-critical infrastructure requires. This RFC should be contingent on an HA Director solution existing first, unless the design is changed such that the Director is no longer in the critical path of workload operations.


2. Security Architecture Change

Current State:
Workload clusters (e.g., Kubernetes) do not need network access to the BOSH Director API, nor do they require Director credentials. The Director is an infrastructure concern, isolated from the workloads it manages.

Proposed State:
This RFC requires workload clusters to:

  • Have network connectivity to the Director API
  • Hold credentials authorized to call Director endpoints
  • Make authenticated API calls at runtime

This is a substantial change to the security boundary. Workloads now become potential attack vectors against the Director. A compromised Kubernetes cluster could potentially:

  • Manipulate disks across deployments (depending on authorization scope)
  • Denial-of-service the Director API
  • Leverage Director credentials for lateral movement

The RFC's authorization model is underspecified - it mentions "authorized clients" but doesn't detail how credentials are scoped, rotated, or isolated per workload.


3. No OSS Consumer

There is no open-source BOSH release identified as an adopter of this feature. For the community to take on the additional complexity this RFC introduces - both in the Director codebase and in operational requirements - there should be a concrete plan for at least one OSS BOSH release to adopt dynamic disks. Without this, the feature adds maintenance burden with no clear benefit to the community.

@metskem
Copy link
Copy Markdown
Contributor

metskem commented Jan 9, 2026

Agree with Ruben's statements.
The RFC introduces a dangerous dependency on the BOSH director, and making the BOSH director high available would introduce a lot of extra complexity.
The requested functionality (disk provisioning) looks more something to be solved with a Service Broker.

@mariash
Copy link
Copy Markdown
Member Author

mariash commented Jan 9, 2026

@rkoster @metskem to address the concerns:

  1. First of all, this feature is completely opt-in. By default no new API is exposed, no additional jobs can be scheduled and executed . Existing BOSH behavior remains unchanged. Disk management jobs are executed by separate Director workers and the number is configurable and defaults to 0.
  2. For your first point on Director availability and runtime dependency, the way this feature is used depends on the consumer implementation. For example, Kubernetes CSI controllers are built around reconcile and retry model. If Director API is temporary unavailable, new operations will be delayed, but the system will converge to the desired state once Director returns. While having an HA Director would be ideal, lack of it should not block this opt-in feature with eventual-consistency semantics.
  3. For the point on security considerations, consumers of the disk management API like Disk Provisioner can be deployed in a secure and isolated way. They do not need to run along side workloads and can be deployed in a restricted environment or even collocated with the BOSH Director on the same VM.
  4. There is no OSS consumer currently for this. This feature enables futures Kubernetes integration and strengthens BOSH's role as a single control plane for IAAS resources with disk management integrated into VM lifecycle and protected from conflicting VM operations. Historically, Cloud Foundry and BOSH have accepted opt-int features that served the needs of specific community members or commercial use cases, provided they don't change the default behavior and the maintainers were willing to support them. We are prepared to take the responsibility for the ongoing maintenance of this feature. Keeping it in BOSH allows future orchestrators to benefit from a shared implementation rather than encouraging private solutions outside of the project.

@rkoster
Copy link
Copy Markdown
Contributor

rkoster commented Jan 10, 2026

@mariash Thank you for your response. Could you provide some recent examples (links to PRs or commits) to support:

Historically, Cloud Foundry and BOSH have accepted opt-int features that served the needs of specific community members or commercial use cases.

@mariash
Copy link
Copy Markdown
Member Author

mariash commented Jan 12, 2026

@rkoster here is a recent example: cloudfoundry/routing-release#451

@mariash
Copy link
Copy Markdown
Member Author

mariash commented Jan 12, 2026

I updated the proposal with the potential BOSH CSI implementation. That CSI should be straightforward to implement and would be beneficial for BOSH community. We would like to keep BOSH changes in the upstream if possible.

@Alphasite
Copy link
Copy Markdown
Contributor

I wanted to call out what I see as the positive security implications here. In practice, the alternative to this feature isn’t “no dynamic disks” — it’s implementing dynamic disks outside BOSH, which tends to push cloud permissions into the workload/tenant boundary.

That means we move from “IaaS privileges live in the Director (which is already the trusted component for IaaS access)” to “each workload environment needs some form of IaaS permission,” whether that’s static keys, instance roles, or workload identity. Even with the more modern options, you’re still granting cloud-level capabilities inside a boundary that’s harder to secure and audit consistently.

I’ve seen this play out in PKS-era Kubernetes (and it’s m OSS KUBO release) on BOSH using cloud CSI plugins: it worked, but every cluster needed cloud permissions to provision/attach volumes. That increased blast radius — compromising the cluster control plane could translate into cloud-level capabilities — and it increased operational burden because we had to ship/patch 5 CSIs (1 for each IAAS) in the tenant environment.

This design centralizes privileged disk operations back into the Director and enables narrowly scoped UAA access for consumers (e.g., disk operations only), which reduces credential sprawl and reduces the number of places where cloud-level privileges exist. Compromising a cluster no longer immediately yields IaaS privileges.

This isn’t risk-free: it expands the Director API surface and makes availability important for new disk operations, so it needs guardrails (tight scopes, per-deployment credentials, network restrictions, and strong auditing). But compared to the current workarounds, I think this is a net improvement in least privilege and operational security.

More broadly, I view this as a foundational primitive: not valuable on its own, but an enabling capability that makes it significantly easier for anyone to build stateful workloads on top of BOSH without reinventing (and re-securing) a parallel disk-control plane in every deployment.

@rkoster
Copy link
Copy Markdown
Contributor

rkoster commented Jan 13, 2026

@mariash Thank you for providing context on the routing-release example. However, I don't think routing-release#451 (cloudfoundry/routing-release#451) supports the precedent you're citing. That issue was a straightforward library choice (whether to use go-metric-registry vs Prometheus directly) — a minor implementation decision with no architectural implications, no new API surface, and no changes to operational or security models.
What would demonstrate the claimed precedent is an example where the community accepted an opt-in feature that:

  • Introduced significant new capabilities with commercial/specific use-case motivation
  • Added maintenance burden with commitment from maintainers to support it
  • Had comparable scope: new APIs, security considerations, or operational model changes

Separately, I'd like to raise a concern about the structure of this RFC. It appears to combine two distinct proposals:

  1. Dynamic Disk Lifecycle Management
    The ability to create, attach, detach, move, and delete persistent disks on BOSH-managed VMs with more flexibility than currently exists.
  2. Out-of-Band Runtime API
    A new /dynamic_disks/* API that allows external systems to provision disks at runtime, bypassing the deployment manifest.
    These have very different implications. Proposal 1 could potentially fit within BOSH's existing model — disk operations triggered through bosh deploy with the manifest remaining the source of truth. Proposal 2 is what introduces the architectural concerns we've raised: runtime dependency on Director, credentials in workload boundaries, and state that exists outside the manifest.
    Question: Can the dynamic disk lifecycle capabilities be implemented without the out-of-band API? For example, if a CSI controller needs to provision a disk, could it update the deployment manifest and trigger a deploy rather than calling a runtime API directly?
    If these proposals can be decoupled, I'd suggest splitting the RFC. The community could evaluate the disk lifecycle improvements on their own merits, separate from the more contentious question of whether BOSH should move away from the "everything is a bosh deploy" paradigm.

@rkoster
Copy link
Copy Markdown
Contributor

rkoster commented Jan 13, 2026

@Alphasite I agree with your framing of this as a foundational primitive, and I share your concern about IaaS credentials sprawling into workload boundaries. Centralizing disk operations is a sound architectural goal.
However, I don't think BOSH itself needs to implement the full runtime API to achieve this. The primitive that BOSH should provide is dynamic disk lifecycle management — the ability to create, attach, detach, move, and delete disks on VMs. This could remain within BOSH's existing model, triggered through bosh deploy with the manifest as source of truth.
The runtime orchestration layer (the thing that decides when to provision disks and calls the appropriate APIs) could be a separate component that consumes BOSH's disk primitives. This component — whether it's a CSI controller, a dedicated disk provisioner, or something else — would be responsible for the reconcile/retry logic, credential management, and integration with workload schedulers.
This separation would:

  • Keep BOSH focused on IaaS abstraction and VM lifecycle (what it's good at)
  • Avoid introducing runtime dependencies on the Director
  • Allow the orchestration layer to evolve independently (different implementations for different use cases)
  • Still achieve the security benefits you described (IaaS credentials stay in trusted infrastructure, not workloads)

@beyhan beyhan moved this from Inbox to In Progress in CF Community Jan 13, 2026
@Alphasite
Copy link
Copy Markdown
Contributor

@rkoster quick thoughts — I need to think about this more, but:

From a design perspective I actually like the shape of it. It avoids introducing brand-new user-facing constructs, and conceptually it’s similar to the vSphere Supervisor model: the guest cluster requests disks, and a para-virtual CSI proxies those requests to a higher-level control plane where they’re resolved declaratively.

Where I’m struggling is the fit with the current BOSH implementation model.

My biggest concern is throughput / contention. Today most Director work is effectively serialized at the deployment level. Naively, that suggests we’d end up processing disk attach/detach operations one-at-a-time (or at least with very limited concurrency), which could become a hard bottleneck for larger clusters doing lots of volume churn.

That might be fine for a ~5–10 node / ~1000 pod cluster, but once you get into 100–1000 nodes, I worry we’ll be “always behind” on attaches. Batching/coalescing could improve throughput, but then tail latency becomes the problem (and storage provisioning latency becomes user-visible).

I’m also unsure how upgrades would work in practice. A rolling update evicts workloads; workloads reschedule and may need to reattach volumes. But during an update, manifest changes are blocked, and if the scheduler needs to attach volumes to make progress while the Director is busy rolling/recreating VMs, it’s easy to imagine a deadlock: the upgrade wants the node drained, draining causes reschedules that require attaches, and attaches can’t proceed fast enough (or can’t proceed at all) until the upgrade finishes.

I’m not as familiar with BOSH internals as you are, so I may be missing something or there may be an obvious solution here — but for the manifest-based approach, does making dynamic disks work well at scale essentially require a desired-state / continuous reconciliation model (rather than imperative “do X now” calls) to get acceptable throughput and predictable upgrade behavior?NBC

@beyhan
Copy link
Copy Markdown
Member

beyhan commented Jan 14, 2026

Update from the TOC meeting yesterday: we agreed to continue the concept discussion during the FI WG meeting on Thursday 4:30 p.m. CET.

@beyhan
Copy link
Copy Markdown
Member

beyhan commented Jan 16, 2026

We discussed the proposed concept for dynamic discs during yesterday's FI WG meeting. While there were alternative proposals for how to implement dynamic discs support, the outcome is that we'll continue with the proposal in this RFC because it builds on and extends concepts which are currently available in BOSH. Nevertheless, we had a really great discussion about BOSH history (thank you all for that!), and I highly recommend watching the recordings when they become available ;-).

@Gerg
Copy link
Copy Markdown
Member

Gerg commented Jan 20, 2026

We discussed the proposed concept for dynamic discs during yesterday's FI WG meeting. While there were alternative proposals for how to implement dynamic discs support, the outcome is that we'll continue with the proposal in this RFC because it builds on and extends concepts which are currently available in BOSH. Nevertheless, we had a really great discussion about BOSH history (thank you all for that!), and I highly recommend watching the recordings when they become available ;-).

Here is the link to the recording: https://youtu.be/OPqFNpIYqFk?si=vFPqOxEn1It6Wz4D

@mariash
Copy link
Copy Markdown
Member Author

mariash commented Mar 17, 2026

Copy link
Copy Markdown
Member

@beyhan beyhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @mariash for including the Diego integration plan and the authorization options!

I personally lean towards the architecture of this proposal over #1453, but I do see one advantage in that proposal which is missing here: native support for BOSH deployments where BOSH Director runs without UAA. I think we should discuss this and have a concept how it can be achieved. It doesn't necessarily have to go via the BOSH Agent, maybe BOSH links, for example. Of course, as mentioned, the implementation can be carried out by multiple contributors, but it would be great to at least agree on a concept upfront, so that the full picture around this is clear.

##### Components

1. **Service Broker** -- Handles `cf create-service` / `cf bind-service`. Existing pattern from NFS/SMB volume services. Creates disk records, returns volume mount config (driver name + disk CID) in bind response.
2. **Volume Driver** -- Docker Volume Plugin v1.12, collocated on Diego cells. Discovered by volman via spec file in `/var/vcap/data/voldrivers`. Reusable pattern from existing NFS/SMB volume services. On mount, notifies Disk Provisioner that a disk is needed on this cell. Waits for the disk to appear. Formats, partitions, and mounts. On unmount, notifies Disk Provisioner that the disk is no longer needed.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have two questions here:

  1. From the sequence diagram below I can't follow how the volume driver notifies the disk provisioner. The sequence diagram show that rep calls the driver. Is the explanation here wrong or the diagram below?
  2. What is the relation between app instance and disk. Does every app instance get its on disk?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @beyhan,

  1. I updated the RFC. There is no need to notify Disk Provisioner from Volume Driver.
  2. This could be implemented differently by Disk Provisioner. This would depend on a supported IAAS. If IAAS allows to attach disks to multiple VMs then app instances across an app can use 1 disk. If IAAS does not support it then each app instance should get its own disk because they can not be shared. In the case I was describing it is the second approach. But first can be implemented as well if needed by the design.

@cppforlife
Copy link
Copy Markdown

there was no emoji for agree, but i mostly agree with @beyhan sentiment.

native support for BOSH deployments where BOSH Director runs without UAA.

API front-door of bosh (BOSH Director API) supports two identity provider modes: local Director users & UAA. local users are primed in the Director [0] and can be configured with scopes (just like in UAA). the front-door API does not care what identity provider is plugged in so generally speaking this proposal is not tied to UAA (though it's probably worth noting that use of UAA is strongly recommended as it allows to configure BOSH with an upstream IDP).

[0] https://github.com/cloudfoundry/bosh/blob/9f9020221fc93528ca97b90ba98325651a889516/src/bosh-director/lib/bosh/director/api/local_identity_provider.rb#L11

maybe BOSH links, for example

BOSH links approach to me feels quite right (even though we do not have it implemented) as it builds upon our existing configuration mgmt system and allows to scope down "destination" for the creds to particular job. same idea was imagined (though not implemented due to time constraints) for service brokers that may want to communicate back to director for perform creation of full deployments. (today of course this is done in a separate way).

@mkocher
Copy link
Copy Markdown
Member

mkocher commented Mar 18, 2026

I wanted to chime in here as I’ve been hearing about this quite a bit. There’s going to be a very high internal cost to the Broadcom engineers if we can’t get these PRs merged this week. This cost will make it harder for us to contribute features to bosh going forward. We realize this is our problem and not the communities, but I wanted to put it out there.

We understand the desire to see the end to end flow spelled out, and have added that to the RFC.

I’d like to propose we break out an improved authorization system for bosh APIs, and an improved way of calling bosh director APIs from bosh deployed VMs to another RFC. This new dynamic disks API could at any time benefit from that work. We expect the HTTP endpoints will always be useful to accessing this functionality from the CLI for one off operations.

We’d like to find a path forward that allows us to merge something very similar to these changes now but does not paint bosh into a corner in the long run.

@rkoster
Copy link
Copy Markdown
Contributor

rkoster commented Mar 19, 2026

BOSH links approach to me feels quite right (even though we do not have it implemented) as it builds upon our existing configuration mgmt system and allows to scope down "destination" for the creds to particular job. same idea was imagined (though not implemented due to time constraints) for service brokers that may want to communicate back to director for perform creation of full deployments. (today of course this is done in a separate way).

The bosh link would need to contain an endpoint as well as something to authenticate the caller with. Doing something dynamic against UAA feels like to much overhead, also this would be a new pattern for which we would have to figure out credential rotation.

The most promising direction still seems to be instance identity, where the NATS client cert already acts as a kind of implicit identity today. I did a quick exploration of what a dual‑auth scenario (UAA OAuth client creds and/or mTLS using agent identity) could look like.

The idea would be:

  • the Director exposes the usual HTTPS API
  • a job authenticates using its agent identity over mTLS
  • the Director authorizes based on the scoped permissions associated with that agents instance group.

We would need to formally define which permissions/scopes are valid for UAA‑authenticated clients vs. which are available when authenticating via agent‑identity mTLS. The permission model outlined in the dynamic disks RFC seems like a good foundation.

I think clarifying this split (what’s allowed under UAA vs. what’s allowed under agent identity) should be explicitly part of #1401. With a future work section on describing mTLS agent identity auth

@beyhan
Copy link
Copy Markdown
Member

beyhan commented Mar 19, 2026

We discussed this during the FI WG meeting and we agreed that the proposed architecture in this RFC is the one we want the proceed. The only open point which should be added as future improvement in the RFC is the authorization scope which should also work with the suggestion exploration #1401 (comment).

@mariash
Copy link
Copy Markdown
Member Author

mariash commented Mar 19, 2026

@beyhan @rkoster Addressed the authorization model comment:

  • Scope names are updated.
  • Future work for agent mTLS authorization is added from another RFC

https://github.com/mariash/community/blob/dynamic-disks/toc/rfc/rfc-draft-dynamic-disks.md#authorization-model

Please let me know if RFC can be moved to FCP officially?

@mariash mariash force-pushed the dynamic-disks branch 2 times, most recently from 803afe1 to 64d24ba Compare March 19, 2026 18:35
@cweibel
Copy link
Copy Markdown

cweibel commented Mar 19, 2026

I'm fine with moving this to a final comment period and will leave it to the rest to determine what that amount of time is.

@Gerg
Copy link
Copy Markdown
Member

Gerg commented Mar 20, 2026

A majority of the TOC has approved the motion to start the FCP for this RFC.

Given that there has already been extensive discussion on this RFC, we may do an abridged FCP, ending on next Tuesday's TOC meeting, as per the CFF RFC bylaws:

Once the FCP starts, approvals shall be given using the GitHub review system. The PR can be merged or closed early if there is consensus among the decision makers.

As usual, any significant issues raised during the FCP may extend the approval window.

Copy link
Copy Markdown
Member

@aramprice aramprice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉

@beyhan beyhan merged commit 8abc02d into cloudfoundry:main Mar 24, 2026
1 check passed
@github-project-automation github-project-automation bot moved this from In Progress to Done in CF Community Mar 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

rfc CFF community RFC toc

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.