feat: upload and extract zip/tar archives into workspace#1226
feat: upload and extract zip/tar archives into workspace#1226bergeouss wants to merge 2 commits intonesquena:masterfrom
Conversation
- Add extract_archive() with zip-slip and tar-slip protection - New /api/upload/extract endpoint for archive uploads - Auto-detect archive files (.zip, .tar.gz, .tgz, .bz2, .xz) - Archives extracted into named subfolder (avoids overwrites) - Workspace file tree auto-refreshes after extraction - Archive extensions added to file picker accept list - i18n: archive_extracted key in all 7 locales Security: path traversal blocked via resolve() prefix check, matching existing safe_resolve_ws() sandbox pattern.
|
Thanks for the PR, @bergeouss! Archive extraction in the workspace is a useful workflow improvement — being able to upload a zip and have it auto-extracted saves a manual unzip step. The security considerations are appreciated: zip-slip/tar-slip protection and no symlink traversal are the right safeguards for server-side archive extraction. A few things to verify before merge:
2828 passed with 0 failures is a clean signal. The 5-file scope is well-contained. The above are the main items to validate — especially the path resolution and size limits. Looking good! |
- Add cumulative extraction size limit (_MAX_EXTRACTED_BYTES = 200 MB) that tracks uncompressed file sizes during extraction to guard against zip/tar bombs (small compressed archives that expand to huge sizes). - On any extraction failure (disk full, corrupted member, size limit), clean up the partially-extracted destination directory to avoid leaving orphaned folders in the workspace.
Review Feedback Addressed
Regarding the other points:
🤖 AI-assisted via Hermes Agent |
|
Thanks for following up, @bergeouss! Both of the substantive items are now addressed:
The confirmations on path resolution and server-side type validation are also correct:
All four points are addressed. This looks ready for maintainer merge review. |
|
Merged in v0.50.237 via #1243. Thank you @bergeouss! 🎉 |
Fixes #525
What
Allow uploading zip, tar.gz, tgz, tar.bz2, tar.xz archives — they are automatically extracted into a subfolder in the workspace.
How
extract_archive()inapi/upload.py: detects archive type, extracts into a named subfolder (archive stem). Uses Python stdlib only (zipfile, tarfile)...traversal allowed.POST /api/upload/extract: same multipart format as/api/upload, but callsextract_archive().uploadPendingFiles()auto-detects archive files via_ARCHIVE_EXTSregex and routes them to the extract endpoint. Workspace file tree refreshes after extraction.acceptattribute.Changes
api/upload.py:extract_archive(),handle_upload_extract()api/routes.py: new import +/api/upload/extractroutestatic/ui.js:_ARCHIVE_EXTSregex, archive-aware upload logic, toaststatic/index.html: archive extensions in file input acceptstatic/i18n.js:archive_extractedkey in all 7 localesTesting
2828 passed, 0 failed (full suite).