feat: upload and extract zip/tar archives into workspace by bergeouss · Pull Request #1226 · nesquena/hermes-webui

bergeouss · 2026-04-28T11:28:50Z

Fixes #525

What

Allow uploading zip, tar.gz, tgz, tar.bz2, tar.xz archives — they are automatically extracted into a subfolder in the workspace.

How

extract_archive() in api/upload.py: detects archive type, extracts into a named subfolder (archive stem). Uses Python stdlib only (zipfile, tarfile).
Zip-slip / tar-slip protection: every member path is resolved and verified to start with the destination directory prefix. No symlinks or .. traversal allowed.
Collision avoidance: if the target folder name exists, appends a 3-digit suffix.
New route POST /api/upload/extract: same multipart format as /api/upload, but calls extract_archive().
Frontend: uploadPendingFiles() auto-detects archive files via _ARCHIVE_EXTS regex and routes them to the extract endpoint. Workspace file tree refreshes after extraction.
Archive extensions added to the file picker accept attribute.
Toast notification shows extraction summary (file count, archive count).

Changes

api/upload.py: extract_archive(), handle_upload_extract()
api/routes.py: new import + /api/upload/extract route
static/ui.js: _ARCHIVE_EXTS regex, archive-aware upload logic, toast
static/index.html: archive extensions in file input accept
static/i18n.js: archive_extracted key in all 7 locales

Testing

2828 passed, 0 failed (full suite).

- Add extract_archive() with zip-slip and tar-slip protection - New /api/upload/extract endpoint for archive uploads - Auto-detect archive files (.zip, .tar.gz, .tgz, .bz2, .xz) - Archives extracted into named subfolder (avoids overwrites) - Workspace file tree auto-refreshes after extraction - Archive extensions added to file picker accept list - i18n: archive_extracted key in all 7 locales Security: path traversal blocked via resolve() prefix check, matching existing safe_resolve_ws() sandbox pattern.

nesquena-hermes · 2026-04-28T12:48:11Z

Thanks for the PR, @bergeouss! Archive extraction in the workspace is a useful workflow improvement — being able to upload a zip and have it auto-extracted saves a manual unzip step.

The security considerations are appreciated: zip-slip/tar-slip protection and no symlink traversal are the right safeguards for server-side archive extraction. A few things to verify before merge:

Path resolution guard: Confirming the zip-slip protection resolves member paths against the real destination directory (after os.path.realpath()) before comparing the prefix. Symlinks inside the archive that point outside the workspace could bypass a pure string-prefix check — confirming realpath() or equivalent is used is important.
Large archive handling: Is there a size limit on archives? A very large zip (or a zip bomb) could exhaust disk space or memory. A configurable or hardcoded max extraction size would be a good safety rail.
Error handling / partial extraction: If extraction fails partway through (disk full, corrupted archive member), is the partial subfolder cleaned up, or does it remain? The collision-avoidance logic creates a new subfolder — confirming failed extractions don't leave orphaned directories would be good.
File type detection: The _ARCHIVE_EXTS regex on the frontend determines routing to the extract endpoint. Is there also server-side validation of the archive content type (not just the extension)? A file named malicious.zip that isn't actually a zip should fail gracefully.

2828 passed with 0 failures is a clean signal. The 5-file scope is well-contained. The above are the main items to validate — especially the path resolution and size limits. Looking good!

- Add cumulative extraction size limit (_MAX_EXTRACTED_BYTES = 200 MB) that tracks uncompressed file sizes during extraction to guard against zip/tar bombs (small compressed archives that expand to huge sizes). - On any extraction failure (disk full, corrupted member, size limit), clean up the partially-extracted destination directory to avoid leaving orphaned folders in the workspace.

bergeouss · 2026-04-28T13:12:21Z

Review Feedback Addressed

Issue (2) — Large archive / zip-bomb protection: Added a cumulative extraction size limit (_MAX_EXTRACTED_BYTES = 200 MB — 10x the upload limit). During extraction, each member's uncompressed size is tracked; if the total exceeds the limit, a clear error is raised. This prevents a small compressed archive from consuming excessive disk space.
Issue (3) — Partial extraction cleanup: Wrapped the extraction loop in a try/except. On any failure (disk full, corrupted member, size limit exceeded), the partially-extracted destination directory is cleaned up via shutil.rmtree() to avoid leaving orphaned folders in the workspace.
Fix: Both protections are in extract_archive() in api/upload.py. The cleanup runs before re-raising so the caller still gets the original error.
Files: api/upload.py

Regarding the other points:

(1) Path resolution guard: Confirmed — (dest_dir / member.filename).resolve() uses Python's Path.resolve() which follows symlinks, then startswith(str(dest_dir.resolve())) validates the final resolved path stays within the workspace. A symlink inside the archive pointing to /etc/passwd would be resolved by .resolve() and caught by the prefix check.
(4) File type detection: Server-side validation happens implicitly — zipfile.ZipFile() and tarfile.open() will raise on invalid content. The error is caught by the generic handler and returned as "Archive extraction failed" (400). This is sufficient since the frontend already filters by extension.

🤖 AI-assisted via Hermes Agent

nesquena-hermes · 2026-04-28T16:54:31Z

Thanks for following up, @bergeouss! Both of the substantive items are now addressed:

(2) Large archive / zip-bomb protection: The 200 MB cumulative extraction limit with mid-loop tracking is the right approach — prevents compressed-small/extracted-large attacks without needing to inspect the archive upfront.
(3) Partial extraction cleanup: shutil.rmtree() on failure before re-raising is correct cleanup semantics — the caller still gets the error, and the workspace doesn't accumulate orphaned directories.

The confirmations on path resolution and server-side type validation are also correct:

Path.resolve() + startswith(str(dest_dir.resolve())) correctly handles symlinks inside archives pointing outside the workspace.
zipfile.ZipFile() / tarfile.open() raising on invalid content is sufficient — the error surfaces cleanly to the user.

All four points are addressed. This looks ready for maintainer merge review.

nesquena-hermes · 2026-04-29T05:30:56Z

Merged in v0.50.237 via #1243. Thank you @bergeouss! 🎉

nesquena-hermes mentioned this pull request Apr 28, 2026

fix: send image uploads as native multimodal inputs #1229

Merged

nesquena-hermes mentioned this pull request Apr 29, 2026

fix+feat: batch v0.50.237 — 21 PRs (embedded terminal, JSON/diff viewers, MCP UI, cron tracking, workspace CRUD, archive upload, DeepSeek V4, NVIDIA NIM, fixes) #1243

Merged

nesquena-hermes closed this Apr 29, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: upload and extract zip/tar archives into workspace#1226

feat: upload and extract zip/tar archives into workspace#1226
bergeouss wants to merge 2 commits intonesquena:masterfrom
bergeouss:feat/525-upload-zip-folder

bergeouss commented Apr 28, 2026

Uh oh!

nesquena-hermes commented Apr 28, 2026

Uh oh!

bergeouss commented Apr 28, 2026

Uh oh!

nesquena-hermes commented Apr 28, 2026

Uh oh!

nesquena-hermes commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bergeouss commented Apr 28, 2026

What

How

Changes

Testing

Uh oh!

nesquena-hermes commented Apr 28, 2026

Uh oh!

bergeouss commented Apr 28, 2026

Review Feedback Addressed

Uh oh!

nesquena-hermes commented Apr 28, 2026

Uh oh!

nesquena-hermes commented Apr 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants