Skip to content

Optimization failure for merged PDFs containing ObjectStreamDict objects #897

@xelan

Description

@xelan

Please ensure the following:

  • Your issue is based on the latest commit

✅ Tested with a build based on the current master

  • State your OS and OS version

✅ Debian 12

  • When reporting a problem with a specific PDF input file please avoid stating the organization responsible for the PDFWriter - just refer to the PDFWriter

The used PDFWriter generates PDF files which apparently contain object streams of the type ObjectStreamDict

for example (output excerpt of pdfcpu validate -vv 3.pdf)

object stream count:100 size of objectarray:100
  585:   offset=  237340 generation=0 types.ObjectStreamDict 
<<
        <Filter, FlateDecode>
        <First, 999>
        <Length, 9411>
        <N, 100>
        <Type, ObjStm>
>>

When this PDF is merged with other PDFs, sometimes the resulting PDF cannot be optimized, any operation which performs optimization (such as pdfcpu bookmark import or of course pdfcpu optimize) fails with an error:

Fatal: writeIndirectObject: undefined PDF object #988 types.ObjectStreamDict

That object number slightly varies even for the same files when the combination of merge and optimize is run multiple times. The issue is unfortunately really hard to reproduce, it only occurs with certain files and in a specific order.

user@pc:/tmp/test$ pdfcpu merge out.pdf 1.pdf 2.pdf 3.pdf 4.pdf && pdfcpu optimize out.pdf
writing out.pdf...
1.pdf
2.pdf
3.pdf
4.pdf
optimizing...
writing out.pdf...
optimizing...
writeIndirectObject: undefined PDF object #988 types.ObjectStreamDict

user@pc:/tmp/test$ pdfcpu merge out.pdf 1.pdf 2.pdf 3.pdf 4.pdf && pdfcpu optimize out.pdf
writing out.pdf...
1.pdf
2.pdf
3.pdf
4.pdf
optimizing...
writing out.pdf...
optimizing...
writeIndirectObject: undefined PDF object #989 types.ObjectStreamDict

user@pc:/tmp/test$ pdfcpu merge out.pdf 1.pdf 2.pdf 3.pdf 4.pdf && pdfcpu optimize out.pdf
writing out.pdf...
1.pdf
2.pdf
3.pdf
4.pdf
optimizing...
writing out.pdf...
optimizing...
writeIndirectObject: undefined PDF object #988 types.ObjectStreamDict

Stack trace:

[...]/FontFile3 617 0 R/FontName/ArialMT/ItalicAngle 0/MaxWidth 2000/StemV 89/Type/FontDescriptor>><</BaseFont/Arial/DescendantFonts[988 1 R]/Encoding/Identity-H/Subtype/Type0/ToUnicode 989 1 R/Type/Font>>>
WRITE: 2024/06/24 14:11:25 writeObject end, obj#725 written to objectStream #996
WRITE: 2024/06/24 14:11:25 addToObjectStream end, obj#:725 gen#:0
WRITE: 2024/06/24 14:11:25 writeDeepObject: begin offset=585777
Arial
WRITE: 2024/06/24 14:11:25 writeDirectObject: end, direct obj - nothing written: offset=585777
Arial
WRITE: 2024/06/24 14:11:25 writeDeepObject: begin offset=585777
[(988 1 R)]
WRITE: 2024/06/24 14:11:25 writeDeepObject: begin offset=585777
(988 1 R)
TRACE: 2024/06/24 14:11:25 FindTableEntry: obj#:988 gen:1 
WRITE: 2024/06/24 14:11:25 writeIndirectObject: object #988 gets writeoffset: 585777
Fatal: writeIndirectObject: undefined PDF object #988 types.ObjectStreamDict

github.com/pdfcpu/pdfcpu/pkg/pdfcpu.writeObjectGeneric
	/root/pdfcpu/pkg/pdfcpu/writeObjects.go:689
github.com/pdfcpu/pdfcpu/pkg/pdfcpu.writeIndirectObject
	/root/pdfcpu/pkg/pdfcpu/writeObjects.go:728
github.com/pdfcpu/pdfcpu/pkg/pdfcpu.writeDeepObject
	/root/pdfcpu/pkg/pdfcpu/writeObjects.go:745
github.com/pdfcpu/pdfcpu/pkg/pdfcpu.writeDirectObject
	/root/pdfcpu/pkg/pdfcpu/writeObjects.go:552
github.com/pdfcpu/pdfcpu/pkg/pdfcpu.writeDeepObject
	/root/pdfcpu/pkg/pdfcpu/writeObjects.go:742
github.com/pdfcpu/pdfcpu/pkg/pdfcpu.writeDeepDict
	/root/pdfcpu/pkg/pdfcpu/writeObjects.go:600
github.com/pdfcpu/pdfcpu/pkg/pdfcpu.writeObjectGeneric
	/root/pdfcpu/pkg/pdfcpu/writeObjects.go:659
github.com/pdfcpu/pdfcpu/pkg/pdfcpu.writeIndirectObject
	/root/pdfcpu/pkg/pdfcpu/writeObjects.go:728
github.com/pdfcpu/pdfcpu/pkg/pdfcpu.writeDeepObject
	/root/pdfcpu/pkg/pdfcpu/writeObjects.go:745
github.com/pdfcpu/pdfcpu/pkg/pdfcpu.writeDirectObject
	/root/pdfcpu/pkg/pdfcpu/writeObjects.go:538
github.com/pdfcpu/pdfcpu/pkg/pdfcpu.writeDeepObject
	/root/pdfcpu/pkg/pdfcpu/writeObjects.go:742
github.com/pdfcpu/pdfcpu/pkg/pdfcpu.writeDirectObject
	/root/pdfcpu/pkg/pdfcpu/writeObjects.go:538
github.com/pdfcpu/pdfcpu/pkg/pdfcpu.writeDeepObject
	/root/pdfcpu/pkg/pdfcpu/writeObjects.go:742
github.com/pdfcpu/pdfcpu/pkg/pdfcpu.writeDeepDict
	/root/pdfcpu/pkg/pdfcpu/writeObjects.go:600
github.com/pdfcpu/pdfcpu/pkg/pdfcpu.writeObjectGeneric
	/root/pdfcpu/pkg/pdfcpu/writeObjects.go:659
github.com/pdfcpu/pdfcpu/pkg/pdfcpu.writeIndirectObject
	/root/pdfcpu/pkg/pdfcpu/writeObjects.go:728
github.com/pdfcpu/pdfcpu/pkg/pdfcpu.writeDeepObject
	/root/pdfcpu/pkg/pdfcpu/writeObjects.go:745
github.com/pdfcpu/pdfcpu/pkg/pdfcpu.writeDeepArray
	/root/pdfcpu/pkg/pdfcpu/writeObjects.go:638
github.com/pdfcpu/pdfcpu/pkg/pdfcpu.writeObjectGeneric
	/root/pdfcpu/pkg/pdfcpu/writeObjects.go:665
github.com/pdfcpu/pdfcpu/pkg/pdfcpu.writeIndirectObject
	/root/pdfcpu/pkg/pdfcpu/writeObjects.go:728
github.com/pdfcpu/pdfcpu/pkg/pdfcpu.writeDeepObject
	/root/pdfcpu/pkg/pdfcpu/writeObjects.go:745
github.com/pdfcpu/pdfcpu/pkg/pdfcpu.writeDeepDict
	/root/pdfcpu/pkg/pdfcpu/writeObjects.go:600
github.com/pdfcpu/pdfcpu/pkg/pdfcpu.writeObjectGeneric
	/root/pdfcpu/pkg/pdfcpu/writeObjects.go:659
github.com/pdfcpu/pdfcpu/pkg/pdfcpu.writeIndirectObject
	/root/pdfcpu/pkg/pdfcpu/writeObjects.go:728
github.com/pdfcpu/pdfcpu/pkg/pdfcpu.writeDeepObject
	/root/pdfcpu/pkg/pdfcpu/writeObjects.go:745
github.com/pdfcpu/pdfcpu/pkg/pdfcpu.writeDirectObject
	/root/pdfcpu/pkg/pdfcpu/writeObjects.go:552
github.com/pdfcpu/pdfcpu/pkg/pdfcpu.writeDeepObject
	/root/pdfcpu/pkg/pdfcpu/writeObjects.go:742
github.com/pdfcpu/pdfcpu/pkg/pdfcpu.writeDeepDict
	/root/pdfcpu/pkg/pdfcpu/writeObjects.go:600
github.com/pdfcpu/pdfcpu/pkg/pdfcpu.writeObjectGeneric
	/root/pdfcpu/pkg/pdfcpu/writeObjects.go:659
github.com/pdfcpu/pdfcpu/pkg/pdfcpu.writeIndirectObject
	/root/pdfcpu/pkg/pdfcpu/writeObjects.go:728
github.com/pdfcpu/pdfcpu/pkg/pdfcpu.writeDeepObject
	/root/pdfcpu/pkg/pdfcpu/writeObjects.go:745
github.com/pdfcpu/pdfcpu/pkg/pdfcpu.writeDirectObject
	/root/pdfcpu/pkg/pdfcpu/writeObjects.go:552

Apparently the affected object causing the problem is the embedded Arial Font in this case. Unfortunately I cannot provide an example file where the issue occurs, I don't know how to artificially produce a PDF with the embedded font as ObjectStreamDict.

If I run optimize on the individual files before the merge, the issue does not occur.

user@pc:/tmp/test$ pdfcpu optimize 1.pdf 
writing 1.pdf...
optimizing...
user@pc:/tmp/test$ pdfcpu optimize 2.pdf 
writing 2.pdf...
optimizing...
user@pc:/tmp/test$ pdfcpu optimize 3.pdf 
writing 3.pdf...
optimizing...
user@pc:/tmp/test$ pdfcpu optimize 4.pdf 
writing 4.pdf...
optimizing...
user@pc:/tmp/test$ pdfcpu merge out.pdf 1.pdf 2.pdf 3.pdf 4.pdf 
writing out.pdf...
1.pdf
2.pdf
3.pdf
4.pdf
optimizing...
user@pc:/tmp/test$ pdfcpu optimize out.pdf 
writing out.pdf...
optimizing...

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions