Skip to content

Commit 9828479

Browse files
committed
feat(zip-includes): implement type:zip support for composing builds from pre-built modules
Add support for `type: "zip"` in include configurations to enable merging zip file contents into the output. This allows composing builds from pre-built .pyz modules. Features: - New config syntax: { "path": "feature.pyz", "type": "zip", "dest": "plugins/" } - New CLI flag: --add-zip PATH[:dest] for adding zips with optional remapping - Exclude patterns apply to zip contents - Proper merge precedence with other include types - Full shebang handling when reading zip files - Metadata and entry point detection to avoid duplicates Implementation: - Add type: "zip" to IncludeConfig and IncludeResolved TypedDicts - New --add-zip CLI argument with extend action - Updated include resolution with proper precedence ordering - Zip extraction with dest remapping and exclude filtering - Helper function _should_exclude_file() for pattern matching - Comprehensive test coverage (6 new tests) Code Quality: - Add pathspec import to build.py top-level imports - Fix exception handling in _should_exclude_file to catch ValueError - Convert f-strings to % formatting in logging statements - Fix variable shadowing and naming conflicts - Break up long lines to comply with 88-character limit - Add proper type annotations and type ignore comments - Update config validation pyright ignores Documentation: - Add --add-zip flag documentation with examples - Update include configuration examples to show type: "zip" syntax - Document type field in IncludeConfig section - Update ROADMAP.md with implementation status Testing: - All 284 tests pass with no regressions - 6 new comprehensive tests for zip include functionality - 1 skipped test (expected)
1 parent b86a7f0 commit 9828479

File tree

10 files changed

+1161
-25
lines changed

10 files changed

+1161
-25
lines changed

ROADMAP.md

Lines changed: 2 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -11,12 +11,10 @@ Some of these we just want to consider, and may not want to implement.
1111

1212
## 🧰 CLI Commands
1313

14-
### CLI Arguments (Defined but Not Yet Implemented)
15-
16-
The following CLI arguments are defined in `cli.py` but not yet used in command handlers:
14+
### CLI Arguments (Implemented)
1715

1816
#### Build Flags
19-
- **`--input`** / `--in` / `input`: Override the name of the input file or directory. Start from an existing build (usually optional).
17+
- **`--input`** / `--in`: Override the name of the input file or directory. Start from an existing build (usually optional). ✅ Implemented
2018

2119
#### Universal Flags
2220
- **`--compat`** / `--compatability`** / `compat`: Compatibility mode with stdlib zipapp behaviour. Currently defined but not implemented.

docs/cli-reference.md

Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -70,6 +70,7 @@ zipbundler build [OPTIONS]
7070
- `-e, --exclude PATTERNS`: Override exclude patterns from config
7171
- `--add-exclude PATTERNS`: Additional exclude patterns to append to config excludes (CLI only)
7272
- `-o, --output PATH`: Override output path from config
73+
- `--input PATH`, `--in PATH`: Use an existing zipapp as the starting point. Can be a file path or directory. When a directory is given, zipbundler resolves the zip file name using the output filename
7374
- `-m, --main ENTRY_POINT`: Override entry point from config
7475
- `-p, --shebang PYTHON`: Override shebang from config
7576
- `--no-shebang`: Disable shebang insertion
@@ -129,6 +130,15 @@ zipbundler build --no-gitignore
129130

130131
# Respect .gitignore (default behavior)
131132
zipbundler build --gitignore
133+
134+
# Append new packages to an existing zipapp
135+
zipbundler build --input dist/app.pyz --force
136+
137+
# Update packages in an existing zipapp using a directory path
138+
zipbundler build --input dist --force
139+
140+
# Append additional files to existing zipapp
141+
zipbundler build --input existing.pyz --add-include new_module.py --force
132142
```
133143

134144
### `zipbundler init`
@@ -284,5 +294,35 @@ zipbundler build --disable-build-timestamp
284294

285295
# Build with deterministic timestamps via environment variable
286296
DISABLE_BUILD_TIMESTAMP=true zipbundler build
297+
298+
# Update existing zipapp with new packages
299+
zipbundler build --input dist/app.pyz --force
300+
301+
# Incremental builds: append new modules to existing zipapp
302+
zipbundler build --input existing.pyz --add-include src/new_module.py --force
287303
```
288304

305+
### Incremental Builds with `--input`
306+
307+
The `--input` flag allows you to start from an existing zipapp and update it with new or modified packages:
308+
309+
```bash
310+
# Initial build
311+
zipbundler build -o dist/app.pyz
312+
313+
# Add new packages to the existing zipapp
314+
zipbundler build --input dist/app.pyz --force
315+
316+
# The new build will:
317+
# - Preserve all files from the input archive that aren't being updated
318+
# - Update any packages specified in the current build
319+
# - Replace PKG-INFO and __main__.py if new metadata or entry point is provided
320+
# - Preserve other metadata files (if any)
321+
```
322+
323+
**Use Cases:**
324+
- **Incremental development**: Add new modules without rebuilding everything
325+
- **Staged builds**: Build base packages once, then add features incrementally
326+
- **Configuration updates**: Update package metadata while preserving application code
327+
- **Patching**: Apply updates to specific packages in a production build
328+

src/zipbundler/build.py

Lines changed: 215 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@
88
from datetime import datetime, timezone
99
from pathlib import Path
1010

11+
import pathspec as ps
1112
from apathetic_utils import is_excluded_raw
1213

1314
from .config.config_types import PathResolved
@@ -182,6 +183,36 @@ def _needs_rebuild(
182183
return False
183184

184185

186+
def _should_exclude_file(path: str, excludes: list[dict[str, object]]) -> bool:
187+
"""Check if a file path matches any exclude pattern.
188+
189+
Args:
190+
path: File path to check (as archive name, using forward slashes)
191+
excludes: List of exclude pattern objects (PathResolved)
192+
193+
Returns:
194+
True if the path matches any exclude pattern, False otherwise
195+
"""
196+
if not excludes:
197+
return False
198+
199+
for exc in excludes:
200+
pattern = exc.get("path", "")
201+
if not isinstance(pattern, str):
202+
continue
203+
204+
# Create a PathSpec from the pattern and check the path
205+
try:
206+
spec = ps.PathSpec.from_lines("gitwildmatch", [pattern])
207+
if spec.match_file(path):
208+
return True
209+
except ValueError:
210+
# If pattern is invalid, skip it and continue checking other patterns
211+
pass
212+
213+
return False
214+
215+
185216
def build_zipapp( # noqa: C901, PLR0912, PLR0913, PLR0915
186217
output: Path,
187218
packages: list[Path],
@@ -196,7 +227,10 @@ def build_zipapp( # noqa: C901, PLR0912, PLR0913, PLR0915
196227
metadata: dict[str, str] | None = None,
197228
force: bool = False,
198229
additional_includes: list[tuple[Path, Path | None]] | None = None,
230+
zip_includes: list[tuple[Path, Path | None]] | None = None,
199231
disable_build_timestamp: bool = False,
232+
input_archive: Path | str | None = None,
233+
preserve_input_files: bool = True,
200234
) -> None:
201235
"""Build a zipapp-compatible zip file.
202236
@@ -225,8 +259,18 @@ def build_zipapp( # noqa: C901, PLR0912, PLR0913, PLR0915
225259
additional_includes: Optional list of (file_path, destination) tuples
226260
for individual files to include in the zip. If destination is None,
227261
uses the file's basename. Useful for including data files or configs.
262+
zip_includes: Optional list of (zip_path, destination) tuples. Each zip
263+
file's contents are extracted and merged into the output. If destination
264+
is provided, remaps the zip's root directory to that destination path.
265+
Exclude patterns are applied to zip contents.
228266
disable_build_timestamp: If True, use placeholder instead of real
229267
timestamp in PKG-INFO for deterministic builds. Defaults to False.
268+
input_archive: Optional path to an existing zipapp archive to use as the
269+
starting point. See preserve_input_files for merge behavior.
270+
preserve_input_files: If True (default, APPEND mode), preserve files from
271+
input archive and merge with new packages. If False (REPLACE mode), wipe
272+
all existing files from input archive and use only new packages. Only
273+
applies when input_archive is set.
230274
231275
Raises:
232276
ValueError: If output path is invalid or packages are empty
@@ -252,10 +296,16 @@ def build_zipapp( # noqa: C901, PLR0912, PLR0913, PLR0915
252296
logger.debug("Dry run: %s", dry_run)
253297
if excludes:
254298
logger.debug("Exclude patterns: %s", [e["path"] for e in excludes])
299+
if input_archive:
300+
logger.debug("Input archive: %s", input_archive)
255301

256302
# Collect files that would be included
257303
files_to_include: list[tuple[Path, Path]] = []
258304

305+
# Track which files are being added from the new build
306+
# (for updating/overwriting in the input archive)
307+
new_files_by_arcname: dict[str, tuple[Path, Path]] = {}
308+
259309
# Add all Python files from packages
260310
for pkg in packages:
261311
pkg_path = Path(pkg).resolve()
@@ -279,13 +329,15 @@ def build_zipapp( # noqa: C901, PLR0912, PLR0913, PLR0915
279329

280330
for f in pkg_path.rglob("*.py"):
281331
# Calculate relative path from archive root (for archive names only)
282-
arcname = f.relative_to(archive_root)
332+
arcname_path = f.relative_to(archive_root)
333+
arcname_str = str(arcname_path)
283334
# Check if file matches exclude patterns (each pattern has its own root)
284335
if _matches_exclude_pattern(f, excludes):
285336
logger.trace("Excluded file: %s (matched pattern)", f)
286337
continue
287-
files_to_include.append((f, arcname))
288-
logger.trace("Added file: %s -> %s", f, arcname)
338+
files_to_include.append((f, arcname_path))
339+
new_files_by_arcname[arcname_str] = (f, arcname_path)
340+
logger.trace("Added file: %s -> %s", f, arcname_str)
289341

290342
# Add additional individual files (from --add-include)
291343
if additional_includes:
@@ -294,12 +346,88 @@ def build_zipapp( # noqa: C901, PLR0912, PLR0913, PLR0915
294346
logger.warning("Additional include file does not exist: %s", file_path)
295347
continue
296348
# Use provided destination or file's basename
297-
arcname = Path(dest) if dest else Path(file_path.name)
298-
files_to_include.append((file_path, arcname))
299-
logger.trace("Added additional file: %s -> %s", file_path, arcname)
349+
arcname_path = Path(dest) if dest else Path(file_path.name)
350+
arcname_str = str(arcname_path)
351+
files_to_include.append((file_path, arcname_path))
352+
new_files_by_arcname[arcname_str] = (file_path, arcname_path)
353+
logger.trace("Added additional file: %s -> %s", file_path, arcname_str)
354+
355+
# Track files from zip includes
356+
zip_files_to_include: dict[str, bytes] = {}
357+
358+
if zip_includes:
359+
logger.debug("Processing %d zip includes", len(zip_includes))
360+
361+
for zip_path, dest in zip_includes:
362+
if not zip_path.exists():
363+
msg = f"Zip include not found: {zip_path}"
364+
raise FileNotFoundError(msg)
365+
366+
if not zip_path.is_file():
367+
msg = f"Zip include is not a file: {zip_path}"
368+
raise ValueError(msg)
369+
370+
logger.debug("Extracting zip: %s", zip_path)
371+
372+
# Read zip file (handle shebang if present)
373+
try:
374+
with zip_path.open("rb") as file_handle:
375+
first_two = file_handle.read(2)
376+
if first_two == b"#!":
377+
file_handle.readline()
378+
else:
379+
file_handle.seek(0)
380+
zip_data = file_handle.read()
381+
382+
# Extract files from zip
383+
temp_zip = io.BytesIO(zip_data)
384+
with zipfile.ZipFile(temp_zip, "r") as zf:
385+
for name in zf.namelist():
386+
# Skip PKG-INFO if we're generating new metadata
387+
if name == "PKG-INFO" and metadata:
388+
logger.trace(
389+
"Skipping PKG-INFO from zip %s (generating new)",
390+
zip_path.name,
391+
)
392+
continue
393+
394+
# Skip __main__.py if we're generating new entry point
395+
if name == "__main__.py" and entry_point is not None:
396+
logger.trace(
397+
"Skipping __main__.py from zip %s (generating new)",
398+
zip_path.name,
399+
)
400+
continue
401+
402+
# Apply dest remapping if provided
403+
zip_arcname: str = str(Path(dest) / name) if dest else name
404+
405+
# Apply exclude patterns
406+
excludes_list: list[dict[str, object]] = [
407+
{"path": e.get("path", "")} for e in (excludes or [])
408+
]
409+
should_exclude = excludes_list and _should_exclude_file(
410+
zip_arcname, excludes_list
411+
)
412+
if should_exclude:
413+
logger.trace(
414+
"Excluded from zip %s: %s", zip_path.name, name
415+
)
416+
continue
417+
418+
# Read file content
419+
zip_files_to_include[zip_arcname] = zf.read(name)
420+
logger.trace(
421+
"Added from zip %s: %s", zip_path.name, zip_arcname
422+
)
423+
424+
except zipfile.BadZipFile as e:
425+
msg = f"Invalid zip file in include: {zip_path}"
426+
raise ValueError(msg) from e
300427

301428
# Count entry point in file count if provided
302-
file_count = len(files_to_include) + (1 if entry_point is not None else 0)
429+
entry_point_count = 1 if entry_point is not None else 0
430+
file_count = len(files_to_include) + len(zip_files_to_include) + entry_point_count
303431

304432
# Incremental build check: skip if output is up-to-date
305433
if (
@@ -341,6 +469,73 @@ def build_zipapp( # noqa: C901, PLR0912, PLR0913, PLR0915
341469
compression_level if compression_const == zipfile.ZIP_DEFLATED else None
342470
)
343471

472+
# If input archive is provided, read it first and preserve existing files
473+
existing_files: dict[str, bytes] = {}
474+
if input_archive:
475+
input_path = Path(input_archive).resolve()
476+
if not input_path.exists():
477+
msg = f"Input archive not found: {input_path}"
478+
raise FileNotFoundError(msg)
479+
if not input_path.is_file():
480+
msg = f"Input archive is not a file: {input_path}"
481+
raise ValueError(msg)
482+
483+
# Read existing zip file content (skipping shebang if present)
484+
try:
485+
with input_path.open("rb") as file_handle:
486+
# Check for shebang (first 2 bytes are #!)
487+
first_two = file_handle.read(2)
488+
if first_two == b"#!":
489+
# Skip shebang line
490+
file_handle.readline()
491+
else:
492+
# No shebang, rewind to start
493+
file_handle.seek(0)
494+
495+
# Read remaining data (the zip file)
496+
zip_data = file_handle.read()
497+
498+
# Extract files from the existing archive
499+
temp_zip = io.BytesIO(zip_data)
500+
with zipfile.ZipFile(temp_zip, "r") as input_zf:
501+
for name in input_zf.namelist():
502+
# Skip PKG-INFO if we're generating new metadata
503+
if name == "PKG-INFO" and metadata:
504+
logger.trace(
505+
"Skipping PKG-INFO from input archive (will generate new)"
506+
)
507+
continue
508+
# Skip __main__.py if we're generating a new entry point
509+
if name == "__main__.py" and entry_point is not None:
510+
logger.trace(
511+
"Skipping __main__.py from input archive (will "
512+
"generate new)"
513+
)
514+
continue
515+
# Skip files if preserve_input_files is False (--replace mode)
516+
if not preserve_input_files:
517+
logger.trace(
518+
"Skipping file from input archive (--replace mode "
519+
"wipes): %s",
520+
name,
521+
)
522+
continue
523+
# Store all other files except those being overwritten by new build
524+
if name not in new_files_by_arcname:
525+
existing_files[name] = input_zf.read(name)
526+
logger.trace(
527+
"Preserving existing file from input archive: %s", name
528+
)
529+
else:
530+
logger.trace(
531+
"Will overwrite existing file from input archive: %s", name
532+
)
533+
534+
logger.info("Loaded input archive: %s", input_path)
535+
except zipfile.BadZipFile as e:
536+
msg = f"Invalid zip file in input archive: {input_path}"
537+
raise ValueError(msg) from e
538+
344539
with zipfile.ZipFile(
345540
output, "w", compression=compression_const, compresslevel=compresslevel
346541
) as zf:
@@ -372,9 +567,19 @@ def build_zipapp( # noqa: C901, PLR0912, PLR0913, PLR0915
372567
"Wrote __main__.py with entry point (main_guard=%s)", main_guard
373568
)
374569

375-
# Add all Python files from packages
376-
for file_path, arcname in files_to_include:
377-
zf.write(file_path, arcname)
570+
# Write all new Python files from packages
571+
for file_path, file_arcname in files_to_include:
572+
zf.write(file_path, str(file_arcname))
573+
574+
# Write files from zip includes
575+
for zip_file_arcname, content in zip_files_to_include.items():
576+
zf.writestr(str(zip_file_arcname), content)
577+
logger.trace("Wrote zip include file: %s", zip_file_arcname)
578+
579+
# Write preserved files from input archive
580+
for preserved_arcname, content in existing_files.items():
581+
zf.writestr(preserved_arcname, content)
582+
logger.trace("Wrote preserved file to output: %s", preserved_arcname)
378583

379584
# Prepend shebang if provided
380585
if shebang:

0 commit comments

Comments
 (0)