Skip to content

On demand plugins#541

Merged
mjh1 merged 25 commits intomainfrom
mh/plugins-od
Mar 5, 2026
Merged

On demand plugins#541
mjh1 merged 25 commits intomainfrom
mh/plugins-od

Conversation

@mjh1
Copy link
Copy Markdown
Contributor

@mjh1 mjh1 commented Feb 26, 2026

Cloud Plugin Support & Separate Docker Image

Summary

  • Add a dedicated cloud Docker image (Dockerfile.cloud) and update CI to build both standard and cloud variants using a matrix strategy, with -cloud suffixed image tags
    • Main reason was to speed up cloud load times, i found that it was faster to not install any of the dependencies into the image and download them at runtime instead. They total about 8gb so it really slows down the docker image pull (and github build times). I also omit the frontend from this image.
  • Enable whitelisted plugin installation in cloud mode — instead of blocking all plugin installs, plugins are now allowed if they match an entry in the CLOUD_PLUGINS_WHITELIST environment variable (with URL normalization for flexible matching)
  • Proxy plugin API endpoints through the cloud WebSocket — the /health, /api/v1/restart, and all /api/v1/plugins endpoints now use the @cloud_proxy() decorator so the frontend can manage plugins on the remote cloud instance
  • Clean up plugins between sessions — on WebSocket disconnect, all installed plugins are uninstalled and session data is wiped, with a gate (_cleanup_event) that blocks the next connection's "ready" message until cleanup finishes
  • Harden cloud subprocess environment — the scope subprocess now receives only whitelisted environment variables instead of inheriting the full parent environment

Changes

File What changed
Dockerfile.cloud New cloud-specific Dockerfile based on nvidia/cuda:12.8.0-cudnn-runtime-ubuntu24.04
.github/workflows/docker-build.yml Matrix strategy for standard / cloud builds with per-variant tags, dockerfiles, and GHA cache scopes
src/scope/cloud/fal_app.py Whitelist-based plugin install gating, plugin cleanup on disconnect, env var whitelisting, uv run --extra kafka instead of pre-install, assets dir moved to /tmp/
src/scope/server/app.py Added @cloud_proxy() to health, restart, and all plugin CRUD endpoints; renamed request -> http_request to avoid shadowing
Screenshot 2026-02-26 at 18 22 42

You can now connect to cloud and go to the plugin settings discover tab and click install, same as the local experience.

mjh1 added 14 commits February 23, 2026 17:33
Signed-off-by: Max Holland <[email protected]>
Signed-off-by: Max Holland <[email protected]>
Signed-off-by: Max Holland <[email protected]>
Signed-off-by: Max Holland <[email protected]>
Signed-off-by: Max Holland <[email protected]>
Signed-off-by: Max Holland <[email protected]>
Signed-off-by: Max Holland <[email protected]>
Signed-off-by: Max Holland <[email protected]>
Signed-off-by: Max Holland <[email protected]>
mjh1 added 3 commits February 26, 2026 12:54
Signed-off-by: Max Holland <[email protected]>
Signed-off-by: Max Holland <[email protected]>
Signed-off-by: Max Holland <[email protected]>
@mjh1 mjh1 requested a review from emranemran February 26, 2026 18:17
mjh1 added 2 commits February 27, 2026 15:11
disable plugins install while stream running or cloud connecting

Signed-off-by: Max Holland <[email protected]>
@mjh1 mjh1 marked this pull request as ready for review February 27, 2026 16:11
@mjh1 mjh1 requested a review from leszko February 27, 2026 16:11
@mjh1
Copy link
Copy Markdown
Contributor Author

mjh1 commented Feb 27, 2026

@emranemran hopefully you don't mind me leaving all the commits untidy, i wanted to preserve the history of doing the built in plugins to begin with.

@cursor
Copy link
Copy Markdown

cursor bot commented Mar 2, 2026

You have run out of free Bugbot PR reviews for this billing cycle. This will reset on April 6.

To receive reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

Copy link
Copy Markdown
Collaborator

@leszko leszko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added some comments. PTAL.Please send me also some environment where I can test it. Thanks.

Comment on lines +689 to +690
normalized = re.sub(r"^git\+https?://", "", normalized)
normalized = re.sub(r"^https?://", "", normalized)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, we support only URLs for plugins, right? Not local path and not PyPI packages. If that's the intention, then I'd add some comment about it. Maybe even in UI, it should be explicit that if you're in the cloud mode, you can only install from URLs.

Copy link
Copy Markdown
Contributor Author

@mjh1 mjh1 Mar 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a note to the UI 👍 in cloud mode we're only allowing installation from the discover tab for now, url entry is hidden.

Comment on lines +233 to +248
for plugin in plugins:
name = plugin.get("name")
if not name:
continue
try:
resp = await client.delete(
f"{scope_url}/api/v1/plugins/{name}", timeout=60.0
)
if resp.status_code == 200:
print(f"Cleanup: uninstalled plugin '{name}'")
else:
print(
f"Warning: Failed to uninstall plugin '{name}': {resp.status_code}"
)
except Exception as e:
print(f"Warning: Failed to uninstall plugin '{name}': {e}")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if it would make sense to delete plugins in parallel?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure that will be ok? I can test it out anyway

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm maybe i'll avoid it for now:

There are a few concurrency hazards:
- pipeline_manager.unload_pipeline_by_id() mutates shared state (_pipelines, _pipeline_statuses, _pipeline_load_params dicts, and main pipeline state). The internal helper is even named _unload_pipeline_by_id_unsafe -- it must be called with the lock held. Running multiple unloads concurrently risks corrupting that shared state.
- CUDA cleanup -- each uninstall calls torch.cuda.empty_cache() and torch.cuda.synchronize(). Concurrent calls to these while pipelines are being torn down could interfere with each other.
- pip uninstall -- the sync implementation ultimately shells out to pip. Running multiple pip processes concurrently is not safe; pip uses a single shared site-packages directory and doesn't handle concurrent modifications.
- The plugin manager lock (self._lock) only guards the registry unregister section, not the entire uninstall operation, so it doesn't provide end-to-end protection.

I was thinking it sounded risky to have concurrent pip commands.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok 👌

"KAFKA_BOOTSTRAP_SERVERS",
"KAFKA_TOPIC",
"KAFKA_SASL_USERNAME",
"KAFKA_SASL_PASSWORD",
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, since we pass credentials into Scope, do we already have some safeguard for the plugins that are added into Daydream Plugin registry?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My plan is to create a write only user for this so that if they ever did leak it would be much less of a concern, they wouldn't be able to read our data. Then we can also have a scan over the current plugins to check nothing looks dangerous.

return normalized

def is_plugin_whitelisted(package: str) -> bool:
raw = os.getenv("CLOUD_PLUGINS_WHITELIST", "")
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are you going to update this plugin list env variable?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a variable that lives in fal, it can be updated through their CLI or web dashboard.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, but who will update it? Will we have some automation which will take our plugins and update this variable?

uses: actions/setup-python@v5
with:
python-version: '3.10'
python-version: '3.12'
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we sure that using different Python version won't have any effect?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good in testing, I can do some more exhaustive testing though. In our pyproject file we specify python 3.12 so I think this was just an accidental case where the docker images were on 3.10 because that was the default installed with the older ubuntu version. I updated that which means it also needs to be updated here for the fal deploy.

request_id = payload.get("request_id")

# Block plugin installation in cloud mode (security: prevent arbitrary code execution)
if method == "POST" and path == "/api/v1/plugins":
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if this check is good enough. The path comes from the WebSocket message payload, so someone could hack like /api/v1/plugins?hack=true and then this check will not apply. Isn't that an issue?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh yep good spot!

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 2, 2026

🚀 fal.ai Preview Deployment

App ID daydream/scope-pr-541--preview
WebSocket wss://fal.run/daydream/scope-pr-541--preview/ws
Commit 50aacd6

Testing

Connect to this preview deployment by running this on your branch:

uv run build && SCOPE_CLOUD_APP_ID="daydream/scope-pr-541--preview/ws" uv run daydream-scope

🧪 E2E tests will run automatically against this deployment.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions bot commented Mar 2, 2026

✅ E2E Tests passed

Status passed
fal App daydream/scope-pr-541--preview
Run View logs

Test Artifacts

Check the workflow run for screenshots.

@leszko
Copy link
Copy Markdown
Collaborator

leszko commented Mar 3, 2026

A few more comments after trying it out:

  1. What about the artifacts defined in the plugins, do we plan to store all model weights for each plugin in the network storage in FAL?
  2. I tried LTX-2 plugin and it hasn't worked, could you check?
  3. So, now all plugins installed in the cloud and ephemeral (you reconnect, you don't see them again). I wonder if we should not store locally somewhere the names of the plugins and install them? This is something that @thomshutt mentioned in his doc. Maybe if we have workflow export/import (this week), then it won't be such a pain, because you'll be able to export and import the workflow (which will automatically install all the plugins).

@mjh1
Copy link
Copy Markdown
Contributor Author

mjh1 commented Mar 4, 2026

A few more comments after trying it out:

  1. What about the artifacts defined in the plugins, do we plan to store all model weights for each plugin in the network storage in FAL?
  2. I tried LTX-2 plugin and it hasn't worked, could you check?
  3. So, now all plugins installed in the cloud and ephemeral (you reconnect, you don't see them again). I wonder if we should not store locally somewhere the names of the plugins and install them? This is something that @thomshutt mentioned in his doc. Maybe if we have workflow export/import (this week), then it won't be such a pain, because you'll be able to export and import the workflow (which will automatically install all the plugins).
  1. That's right the model weights all get downloaded to fal on first run, I've done this for all of the current ones.
  2. I'll check that 👍
  3. Yes that's right we plan to solve that with reloading workflows like we discussed earlier.

@mjh1 mjh1 requested a review from leszko March 4, 2026 13:42
Signed-off-by: Max Holland <[email protected]>
@mjh1
Copy link
Copy Markdown
Contributor Author

mjh1 commented Mar 4, 2026

Screenshot 2026-03-04 at 13 51 07

Wow that is not what i see locally 🤯 i didn't realise github merged the branch temporarily with main then uses that for these github actions.

Copy link
Copy Markdown
Collaborator

@leszko leszko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. The one question that is unanswered is this one:

ok, but who will update it? Will we have some automation which will take our plugins and update this variable?

I just want to make sure we have a plan about updating this. Other than that LGTM. Feel free to Squash and Merge.

@mjh1 mjh1 merged commit 674efaa into main Mar 5, 2026
7 checks passed
@mjh1 mjh1 deleted the mh/plugins-od branch March 5, 2026 12:02
@mjh1
Copy link
Copy Markdown
Contributor Author

mjh1 commented Mar 5, 2026

Looks good. The one question that is unanswered is this one:

ok, but who will update it? Will we have some automation which will take our plugins and update this variable?

I just want to make sure we have a plan about updating this. Other than that LGTM. Feel free to Squash and Merge.

Yep i'll make a github action to update it 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants