Skip to content

perf(ssh): bulk file sync via tar pipe instead of per-file scp #7465

@alt-glitch

Description

@alt-glitch

Problem

SSHEnvironment uploads files one-by-one via scp during FileSyncManager.sync(). With ~581 files (skills, credentials, caches), this means 581 separate scp invocations — each spawning a subprocess and doing a round-trip, even over a ControlMaster socket.

On Daytona (similar sequential pattern), this took 803s for 581 files. SSH with ControlMaster is faster per-file but still O(n) round-trips.

Proposed Solution

Wire a bulk_upload_fn into the SSH backend's FileSyncManager (the callback was added in #7447). Use tar piped over SSH to transfer all files in a single stream:

def _ssh_bulk_upload(self, files: list[tuple[str, str]]) -> None:
    # tar up local files, pipe through ssh, extract on remote
    # Single TCP stream, single subprocess
    tar_cmd = ['tar', 'czf', '-'] + [host for host, _ in files]
    ssh_cmd = self._build_ssh_command() + ['tar', 'xzf', '-', '-C', '/']
    # pipe tar | ssh

Alternatively, rsync with --files-from would handle both uploads and deletes in one call.

Context

  • FileSyncManager now supports optional bulk_upload_fn (added in fix(daytona): bulk upload, config bridge, silent disk cap (#7362) #7447)
  • Daytona bulk upload went from 803s → 4.3s (188× faster) using the same pattern
  • SSH backend at tools/environments/ssh.py, upload method: _scp_upload()
  • The ControlMaster socket reuse helps but doesn't eliminate per-file overhead

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    backend/file-syncFile sync across remote backendsbackend/sshSSH remote executiontype/featureNew feature or requesttype/perfPerformance improvement or optimization

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions