Skip to content

Feature Request: Support for encrypted PDFs with empty user password #34

@mikeschinkel

Description

@mikeschinkel

I am filing this not as an expectation but as something I might implement a PR for if it becomes obvious that I really need it. Right now I only have a few four (4) PDFs of over 7000 with this issue.

Summary

gxpdf fails to parse PDFs that use Standard Security Handler encryption, even when the user password is empty (no password required to view). These files open normally in Preview, Adobe Reader, and other PDF viewers.

Note: ROADMAP.md shows "Encryption (RC4, AES) | Done | v0.1.0" — this appears to be for creating encrypted PDFs. This request is for the inverse: decrypting encrypted PDFs when reading them.

Error

failed to decode ObjStm 2802: FlateDecode failed: failed to create zlib reader: zlib: invalid header

The "invalid zlib header" occurs because gxpdf attempts to decompress encrypted stream data without first decrypting it.

Technical Details

Affected PDFs have an /Encrypt dictionary like:

/Filter/Standard
/V 4
/R 4
/CF<</StdCF<</AuthEvent/DocOpen/CFM/AESV2/Length 16>>>>
/StmF/StdCF
/StrF/StdCF
/U(...) % empty user password (mostly null bytes)
/O(...) % owner password hash
/P -1340 % permissions flags

Key characteristics:

  • /V 4 /R 4 — Encryption version/revision 4
  • /CFM/AESV2 — AES-128 encryption for streams
  • Empty user password — allows viewing without password prompt
  • Owner password — restricts editing/printing

Use Case

Institutions like banks, insurance companies, financial services distribute "permissions-only" encrypted PDFs. These files:

  • Open without password in most PDF viewers
  • Restrict printing, copying, or editing
  • Are common in personal document archives

Proposed Solution

Implement PDF Standard Security Handler for /V 4 /R 4 encryption:

  1. Detection: Check trailer for /Encrypt reference
  2. Key derivation: Compute decryption key using:
    • Empty string as user password
    • /O, /P, /ID values from trailer
    • MD5/SHA-256 per PDF spec revision
  3. Stream decryption: Decrypt streams with AES-128-CBC before decompression
  4. String decryption: Decrypt string objects as needed

Optional enhancements:

  • Support /V 2 /R 3 (RC4-128) for older PDFs
  • Accept user-provided password for password-protected PDFs
  • Expose permission flags via API

Impact

This would enable gxpdf to handle a class of real-world PDFs that currently fail with unfortunate "zlib: invalid header" errors.

References

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions