Skip to content

Vendor libmspack in-tree to drop 7-Zip runtime dependency for firmware update #2172

@ten9876

Description

@ten9876

Plan: Vendor libmspack in-tree to drop the 7-Zip runtime dependency for firmware update

Goal

Replace the QProcess-to-7z shell-out in FirmwareStager::extractFromMsi() with in-tree code so the firmware-update path:

  • Works without any system tool installed (especially relevant for AppImage / DMG / sandboxed installs)
  • Runs in environments that disallow process spawning (Snap, Flatpak, macOS Mac App Store sandboxing, future-proofing)
  • Returns structured errors we can present meaningfully in the UI
  • Doesn't break silently if a future Linux distro repackages 7z under a different binary name

Non-goals

  • We are not writing a general-purpose CAB or LZX implementation. We're vendoring a battle-tested library and writing the smallest-possible glue to read MSI streams.
  • We are not removing the existing InnoSetup .exe byte-pattern path. That code stays for v4.1.x and earlier installers.
  • We are not trying to handle every possible WiX MSI structure. We only need to extract cab*.cab streams from MSIs FlexRadio ships. If they switch to a radically different installer format again, that's a separate effort.

License confirmation

Verified by reading COPYING.LIB and individual source-file headers in libmspack 0.11alpha:

  • License: GNU LGPL v2.1 (the file headers explicitly say "version 2.1", not 2.0 as the project's website summary suggests)
  • AetherSDR license: GPL-3.0
  • Compatibility: LGPL-2.1 → GPL-3 is well-established compatible. LGPL-2.1 §3 explicitly allows opting into "any later version" of GPL, including GPL-3.
  • Distribution: as long as the vendored source keeps its original copyright headers and we include COPYING.LIB in the third-party tree, we're compliant.

Action item before merge: add a third_party/libmspack/LICENSE.txt pointing to the bundled COPYING.LIB and confirm the AetherSDR LICENSE file or a THIRD_PARTY_LICENSES.md lists libmspack alongside the other vendored libs (DeepFilterNet, libspecbleach, r8b, rade, opus, libmosquitto).

Architecture

The MSI we need to read has two layers we need to crack:

SmartSDR_v4.2.18_x64.msi
└─ OLE Compound File (Microsoft CFB) ──── outer wrapper
   ├─ stream "cab1.cab" (LZX-compressed) ─┐
   ├─ stream "cab2.cab" (LZX-compressed) ─┤  Microsoft CAB format
   ├─ stream "cab3.cab" (LZX-compressed) ─┤  + LZX compression
   └─ … etc.                              ─┘
      └─ (inside each cab, one hashed-name payload that is a raw `Salted__` blob)

libmspack handles the CAB+LZX layer (the inner one). For the OLE CFB layer (outer) we write a small reader ourselves — it's a well-documented Microsoft format and we only need read-only stream extraction.

Component split

Component Source Purpose
OleCompoundFile (new, ~300 LoC C++) src/core/OleCompoundFile.{h,cpp} Read MS-CFB; locate streams by name; concatenate sectors into a contiguous byte buffer
libmspack (vendored, ~6000 LoC C) third_party/libmspack/ (subset) Parse CAB structure; decompress LZX-encoded payloads
Glue in FirmwareStager::extractFromMsi() existing file Replace QProcess calls with calls to the above

Data flow

QFile(msi)
  → OleCompoundFile::open()                  // parse CFB header, FAT, directory
  → OleCompoundFile::extractStreamsMatching("cab*.cab")
       returns: QList<QByteArray>            // each element is one cab's bytes
  → for each cab buffer:
       libmspack: open from memory          // via custom mspack_system that reads from QByteArray
       libmspack: extract single payload     // CAB → raw bytes
       check first 8 bytes == "Salted__"
       collect into list of (size, bytes)
  → sort by size desc, pick by m_modelFamily // existing logic
  → write chosen blob to outPath             // existing logic

The key trick: libmspack supports custom I/O via the mspack_system struct (open/read/write/seek/close function pointers). We provide a tiny in-memory implementation so libmspack reads the cab bytes from a QByteArray instead of a real file. No temp files needed — everything stays in memory until we write the chosen .ssdr to staging.

Implementation phases

Phase 1 — Vendor libmspack (subset)

Files to copy from libmspack 0.11alpha into third_party/libmspack/mspack/:

File Lines Reason
mspack.h 2385 Public API — required
system.c 240 I/O abstraction layer
cab.h 140 CAB structures
cabd.c 1510 CAB decompressor
mszip.h 126 MSZIP support (some CABs use it as fallback)
mszipd.c 504 MSZIP decoder
lzx.h 220 LZX header
lzxd.c 781 LZX decoder (the main one for our MSIs)
crc32.c 95 CRC-32 validation
crc32.h 17
macros.h 64 Utility macros

Total: ~6,082 lines C.

Files to omit (encoders, unrelated formats):

  • cabc.c, lzxc.c, mszipc.c — encoders (we only decompress)
  • chm.h, chmc.c, chmd.c — Windows help format (CHM)
  • hlp.h, hlpc.c, hlpd.c — older help format (HLP)
  • kwaj.h, kwajc.c, kwajd.c, szdd*.c, lzss* — KWAJ/SZDD legacy
  • lit.h, litc.c, litd.c — Microsoft LIT
  • oab.h, oabc.c, oabd.c — Outlook Address Book
  • des.h — DES (used only by CHM)

Layout:

third_party/libmspack/
├── COPYING.LIB                ← LGPL-2.1 text
├── README.md                  ← brief note on what's vendored, why, and from where
├── mspack/
│   ├── mspack.h
│   ├── system.c
│   ├── cab.h
│   ├── cabd.c
│   ├── mszip.h, mszipd.c
│   ├── lzx.h, lzxd.c
│   ├── crc32.c, crc32.h
│   └── macros.h
└── CMakeLists.txt             ← static library target

third_party/libmspack/CMakeLists.txt (sketch):

add_library(mspack_static STATIC
    mspack/system.c
    mspack/cabd.c
    mspack/mszipd.c
    mspack/lzxd.c
    mspack/crc32.c
)
target_include_directories(mspack_static PUBLIC mspack)
target_compile_definitions(mspack_static PRIVATE HAVE_CONFIG_H=0)
# Suppress noise from upstream code we don't want to modify
target_compile_options(mspack_static PRIVATE -w)  # GCC/Clang

Top-level CMakeLists.txt addition:

add_subdirectory(third_party/libmspack)
target_link_libraries(AetherSDR PRIVATE mspack_static)

Phase 2 — Write OleCompoundFile

Reference: [MS-CFB] Compound File Binary File Format — the canonical Microsoft spec. It's actually quite readable; we only need a small subset.

API (new file src/core/OleCompoundFile.h):

namespace AetherSDR {

class OleCompoundFile {
public:
    // Returns false if the file is not a valid CFB or can't be read.
    bool open(const QString& path);

    // List all stream names in the compound file.
    QStringList streamNames() const;

    // Read a stream by name into a contiguous byte buffer.
    // Returns empty QByteArray if not found.
    QByteArray readStream(const QString& name) const;

    // Convenience: read all streams whose name matches a wildcard pattern
    // (e.g. "cab*.cab"). Returns name → bytes pairs.
    QList<QPair<QString, QByteArray>> readStreamsMatching(const QString& wildcard) const;

private:
    // CFB structures (just what we need)
    struct Header { /* sector size, FAT info, root dir start */ };
    struct DirEntry { /* name, type, start sector, size */ };

    bool parseHeader();
    bool parseFat();
    bool parseDirectory();
    QByteArray readSectorChain(quint32 startSector, qint64 totalSize, bool isMini) const;

    QFile m_file;
    Header m_header{};
    QList<quint32> m_fat;       // sector → next sector
    QList<quint32> m_miniFat;   // for streams smaller than the mini-stream cutoff
    QByteArray m_miniStreamData;
    QList<DirEntry> m_dirEntries;
};

} // namespace AetherSDR

Implementation notes:

  • CFB sector size is in the header (typically 512 or 4096 bytes). Spec value at offset 30 is power-of-two exponent: 2^value.
  • FAT chains link sectors: next = m_fat[current]. Terminator is 0xFFFFFFFE (ENDOFCHAIN). Free is 0xFFFFFFFF.
  • Directory is itself a stream (chained via FAT); each entry is exactly 128 bytes.
  • Stream names in the directory are UTF-16LE, null-terminated, length in bytes at offset 64 of the entry (includes the null terminator, so subtract 2 to get string-byte length).
  • The Mini Stream is for small files (<4096 bytes typically). Our CABs are megabytes — they live in the regular FAT. We need to handle the mini-stream code path defensively (guard against malformed files trying to flag a large stream as mini), but in practice we'll never read a mini-stream.

Estimated size: ~300 LoC C++ for the full reader, including error handling and a small set of internal asserts. Lots of bit-twiddling but no algorithmic complexity.

Wildcard implementation: we don't need a real glob. cab*.cab is the only pattern we care about — a simple name.startsWith("cab") && name.endsWith(".cab") check is enough. We'll keep the API generic for future use but implement the matcher minimally.

Phase 3 — In-memory mspack_system adapter

libmspack normally opens cabs by filename. To open one from memory (a QByteArray containing cab bytes), we provide a custom mspack_system struct.

The interface (from mspack.h):

struct mspack_system {
  struct mspack_file *(*open)(struct mspack_system *self, const char *filename, int mode);
  void (*close)(struct mspack_file *file);
  int (*read)(struct mspack_file *file, void *buffer, int bytes);
  int (*write)(struct mspack_file *file, void *buffer, int bytes);
  int (*seek)(struct mspack_file *file, off_t offset, int mode);
  off_t (*tell)(struct mspack_file *file);
  void (*message)(struct mspack_file *file, const char *format, ...);
  void *(*alloc)(struct mspack_system *self, size_t bytes);
  void (*free)(void *ptr);
  void (*copy)(void *src, void *dest, size_t bytes);
};

Our adapter (in src/core/MspackMemSystem.cpp, ~150 LoC C):

  • open(): ignore filename, return our pre-set buffer wrapper
  • read(): memcpy from buffer at current offset
  • seek(): adjust current offset
  • tell(): return current offset
  • write(): extracted file output — we do want this, but to a target file the user will then read. Easier: write to disk via a normal FILE*, we only fake the input side.
  • alloc/free/copy: delegate to malloc/free/memcpy

Two function tables:

  1. g_inMemSystem — for reading cabinets from a QByteArray
  2. Standard libmspack file system — for writing extracted output (or we can write to memory too)

Extraction destination: easiest is a temp file via QTemporaryFile. Cleaner is in-memory; we'd need to capture libmspack's write() calls and accumulate into a QByteArray. Given firmware blobs are 64–386 MB and we're already reading the whole MSI into memory anyway, let's go in-memory for both sides — same complexity, fewer disk hits.

Phase 4 — Replace extractFromMsi() body

Before (current code, ~120 lines, shells out to 7z):

const QString sevenZ = findExtractionTool();
if (sevenZ.isEmpty()) { /* error */ }
runSevenZip({"x", "-y", "-o" + tempDir, msiPath, "cab*.cab"}, ...);
// extract each cab
// scan for Salted__
// pick by family

After (~80 lines, native):

OleCompoundFile cfb;
if (!cfb.open(msiPath)) {
    emit stageFailed("Could not open MSI as compound file: " + msiPath);
    return false;
}
const auto cabStreams = cfb.readStreamsMatching("cab*.cab");
if (cabStreams.isEmpty()) {
    emit stageFailed("No CAB streams in MSI.");
    return false;
}

emit stageProgress(75, QString("Decompressing %1 CAB streams...").arg(cabStreams.size()));

QList<Blob> blobs;
for (const auto& [name, bytes] : cabStreams) {
    QByteArray ssdr;
    if (!extractCabPayloadInMemory(bytes, ssdr)) {
        emit stageFailed("Failed to decompress " + name);
        return false;
    }
    if (ssdr.startsWith("Salted__"))
        blobs.append({name, ssdr});
}

// sort by size desc, pick by m_modelFamily, write to outPath — existing logic

extractCabPayloadInMemory() is a ~50-line helper that wires the in-memory mspack_system to a mscab_decompressor, opens the cab, walks the file list, extracts the (single, in our case) payload into a QByteArray. Returns false on any libmspack error.

Phase 5 — Drop the 7z dependency

  • Delete findExtractionTool() and the per-platform 7z install instructions from extractFromMsi()'s error path
  • Update the README dependency list (currently does not mention 7z but check anyway)
  • Update CHANGELOG.md for the next release: "MSI extraction is now self-contained; no external 7-Zip required"
  • Remove the QProcess include if it's no longer used elsewhere in FirmwareStager.cpp (it's used elsewhere in the project, so the include stays project-wide)

Phase 6 — Tests

Unit tests (tests/ole_compound_file_test.cpp):

  • Open the v4.2.18 MSI from a fixed test fixture
  • Assert exactly 6 cab*.cab streams
  • Read cab1.cab, assert size matches MSI File table value (~64 MB compressed before LZX decompression)
  • Round-trip: read full MSI byte range → reassemble streams → byte-identical to original

Integration test (tests/firmware_extract_test.cpp):

  • Given the v4.2.18 MSI on disk, call extractFromMsi(msi, out)
  • Assert output file:
    • Exists
    • Starts with Salted__
    • Size = 386,289,360 bytes (FLEX-6x00 case) or 64,000,000 bytes (FLEX-9600 case, depending on m_modelFamily)
    • MD5 = 9e8888dc0558ee420ed82f370f805025 for FLEX-6x00 (matches the value Jeremy verified by flashing)

Test fixture handling: the v4.2.18 MSI is 669 MB and FlexRadio's licensed property — we don't check it into the repo. Instead the integration test:

  1. Looks for the MSI at ~/build/reference/SmartSDR_v4.2.18_x64.msi (where Jeremy keeps it)
  2. Falls back to environment variable AETHERSDR_TEST_MSI=/path/to/msi
  3. Skips with SKIP status if neither is available — CI doesn't have the file, so the test just doesn't run there

Manual smoke test:

  • Build, run UI, click Select Installer → pick the MSI → confirm staged file MD5 matches 9e8888dc0558ee420ed82f370f805025

Build system integration

CI Docker image

The Docker CI image (ghcr.io/ten9876/aethersdr-ci) currently has p7zip for the existing 7z-based extraction. After this change we can drop that, but we should keep it for one release cycle in case the rollback path needs the old binary.

No new system packages required — libmspack vendored in-tree means we add zero runtime deps and zero build deps beyond what we already have.

Per-platform build verification

Platform Status
Linux x86_64 Trivial — libmspack is plain C99, builds with GCC
Linux aarch64 (Pi) Same as x86_64
macOS (Apple Silicon + Intel) Plain C99, builds with Apple Clang
Windows (MSVC) libmspack is portable C99; should be fine but worth verifying once. May need _CRT_SECURE_NO_WARNINGS or similar pragma.

Add a BUILD_FROM_SOURCE.md note for distros packaging AetherSDR: libmspack is bundled, no system version is preferred.

Risks & mitigations

Risk Likelihood Mitigation
libmspack alpha-status surprise: 0.11alpha is still tagged alpha after 6+ years. Author's stated reason is feature incompleteness, not bugs — and this is a mature, well-tested codebase Low The CAB+LZX paths we exercise have been used by cabextract for 20+ years. We're using a tiny subset of stable functionality. Pin to a specific source archive version; bump deliberately.
MSI structure changes in future SmartSDR releases: FlexRadio could switch from WiX 6 to something else Low Format detection by magic bytes already in place. If they switch again, file an issue with a sample, write a third extractor. The existing dispatcher in verifyAndExtract() is designed for this.
LZX-Delta or other rare CAB variants: some CABs use LZX-Delta or Quantum compression, not plain LZX Low libmspack supports LZX, MSZIP, and Quantum. The MSI we tested uses LZX:18 which is the common WiX default. If we hit Quantum we already have the decoder vendored. LZX-Delta is rare and probably not used by WiX.
Large file memory pressure: reading 669 MB MSI into memory (we already do this in the _EXE byte-pattern path), then 386 MB cab streams, then 386 MB decompressed payload — peak ~1.4 GB Medium This is acceptable on dev machines but tight on Raspberry Pi (8 GB RAM, but other apps running). Mitigation: stream the OLE CFB sector reads directly into libmspack via the mspack_system::read() callback rather than reading the full cab into memory first. ~1 day extra effort if needed; defer until/unless Pi users complain.
Vendored library security updates: libmspack has had ~6 CVEs over 20 years (rare but real) Low Watch the libmspack releases page; cherry-pick CVE fixes by re-vendoring the affected file(s). The vendor README should record the version + date for easy comparison.
License contamination if we accidentally use an LGPL-only API: very unlikely with our static-link LGPL-2.1 setup but worth flagging Very low LGPL-2.1 §3 explicitly allows opt-in to GPL-3 terms. Document the licensing decision in third_party/libmspack/README.md and THIRD_PARTY_LICENSES.md.

Effort estimate

Breaking it down:

Phase Estimate Notes
Phase 1: vendor + CMake 0.5 day Mostly file copying + small CMakeLists
Phase 2: OleCompoundFile 1 day Spec is clear, only the read path
Phase 3: mspack_system adapter 0.5 day ~150 lines of straightforward C
Phase 4: rewrite extractFromMsi() 0.5 day Mostly deletion + replacement
Phase 5: drop 7z + docs 0.25 day README, CHANGELOG, error message cleanup
Phase 6: tests 1 day Unit + integration; test fixture handling
Total ~3.75 days

If only working a few hours a day, allow a calendar week.

Open questions

  1. Where to call out the licensing addition — current vendored libs (DeepFilter, libspecbleach, etc.) don't have a unified THIRD_PARTY_LICENSES.md. This is a small-but-real cleanup opportunity to make at the same time. Recommended: yes.

  2. Should the OleCompoundFile reader be its own header even if no other AetherSDR code uses it? — yes. It's a well-bounded utility; future contributors might want it for other Microsoft-format reading (CHM, MSP, etc.).

  3. In-memory vs temp-file extraction: pure-memory means peak ~1.4 GB transient; temp-file means ~700 MB peak + a /tmp file the size of the staged firmware. For a one-time-per-update operation, either is fine. Plan defaults to in-memory for simplicity; revisit if memory profile becomes a problem on Pi.

  4. Do we want to also take this opportunity to validate the staged file's MD5 against an authoritative FlexRadio source? — out of scope for this PR. FlexRadio doesn't publish per-file MD5s for .ssdr (only for the installer container). Reuses MD5 we already compute on the staged file for UI display.

  5. Does this PR also include OLE CFB write support? — explicitly no. Read-only. We never need to create a compound file.

Acceptance criteria

The PR is mergeable when:

  • libmspack vendored at third_party/libmspack/ with LGPL-2.1 text and version-tracking README
  • OleCompoundFile.{h,cpp} reads streams from the v4.2.18 MSI
  • FirmwareStager::extractFromMsi() no longer calls QProcess or references 7-Zip
  • findExtractionTool() deleted (or stays for the InnoSetup fallback if still needed — verify)
  • Integration test extracts a .ssdr matching the known-good MD5 (9e8888dc0558ee420ed82f370f805025 for FLEX-6x00 v4.2.18)
  • Manual smoke test in the UI: Select Installer → MSI → staged file matches MD5
  • Build clean on Linux x86_64 (CI), macOS, and at least one ARM platform (Pi or aarch64 Linux)
  • CHANGELOG entry under the next release
  • THIRD_PARTY_LICENSES.md (new or updated) lists libmspack

Metadata

Metadata

Assignees

No one assigned

    Labels

    awaiting-responseWaiting for reporter to provide additional informationenhancementImprovement to existing feature

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions