Skip to content

Support for short dash array (invalid border array: [0 0 1 [3]]) #766

@dpb587

Description

@dpb587

Hello - I was reading a PDF file and ran into the following error during an info/api.PDFInfo call:

invalid border array: [0 0 1 [3]]

tldr... seems like a spec-compliant representation and the package should support the case.

Reproduction

pdfcpu

$ git clone https://github.com/pdfcpu/pdfcpu
$ git checkout v0.6.0

sample (PDF/1.1, producer Acrobat PDFWriter 2.0 for Windows, circa 1995)

$ curl -o "$TMPDIR/sample.pdf" https://www.iptc.org/std/IIM/3.0/specification/IIMV3.PDF
$ echo "7799f6fef4308db9f671ba40e4acfebd1ecea943e295e03d5733c8d650539ad9 $TMPDIR/sample.pdf" | sha256sum -c

error

$ go run ./cmd/pdfcpu info "$TMPDIR/sample.pdf"                    
invalid border array: [0 0 1 [3]]
exit status 1

Investigation

From some manual debugging of the file, it seems like the error originates first from page 6, subtype of Link. Additionally, the full stack to the failing expectation is included below.

Debug Stack
goroutine 1 [running]:
github.com/pdfcpu/pdfcpu/pkg/pdfcpu/validate.validateBorderArray(0xc00019a1e0, {0xc000696e00?, 0x15bec4f?, 0x9?})
	./pkg/pdfcpu/validate/annotation.go:1418 +0xb2
github.com/pdfcpu/pdfcpu/pkg/pdfcpu/validate.validateAnnotationDictGeneralPart2(0xc00019a1e0?, 0xc0001932f8?, {0x15bec4f, 0x9})
	./pkg/pdfcpu/validate/annotation.go:1497 +0x17f
github.com/pdfcpu/pdfcpu/pkg/pdfcpu/validate.validateAnnotationDictGeneral(0xc00037cbf0?, 0x16ae260?, {0x15bec4f, 0x9})
	./pkg/pdfcpu/validate/annotation.go:1520 +0x5c
github.com/pdfcpu/pdfcpu/pkg/pdfcpu/validate.validateAnnotationDict(0x1583da0?, 0x16ae260?)
	./pkg/pdfcpu/validate/annotation.go:1602 +0x33
github.com/pdfcpu/pdfcpu/pkg/pdfcpu/validate.validatePageAnnotations(0xc00019a1e0, 0x15bbf7b?)
	./pkg/pdfcpu/validate/annotation.go:1667 +0x2bf
github.com/pdfcpu/pdfcpu/pkg/pdfcpu/validate.validatePagesAnnotations(0xc00019a1e0, 0x15bbf7b?, 0x0)
	./pkg/pdfcpu/validate/annotation.go:1745 +0x2bc
github.com/pdfcpu/pdfcpu/pkg/pdfcpu/validate.validatePagesAnnotations(0xc00019a1e0, 0x15bbf7b?, 0x0)
	./pkg/pdfcpu/validate/annotation.go:1737 +0x2f9
github.com/pdfcpu/pdfcpu/pkg/pdfcpu/validate.validatePagesAnnotations(0xc00019a1e0, 0x70000c000001618?, 0x0)
	./pkg/pdfcpu/validate/annotation.go:1737 +0x2f9
github.com/pdfcpu/pdfcpu/pkg/pdfcpu/validate.validateRootObject(0xc00019a1e0)
	./pkg/pdfcpu/validate/xReftable.go:928 +0x3d0
github.com/pdfcpu/pdfcpu/pkg/pdfcpu/validate.XRefTable(0xc00019a1e0)
	./pkg/pdfcpu/validate/xReftable.go:44 +0xf0
github.com/pdfcpu/pdfcpu/pkg/api.readAndValidate({0x16ad518?, 0xc000012530?}, 0xc0001a1930, {0xc000193a20?, 0x10b61a5?, 0x1a1df80?})
	./pkg/api/api.go:133 +0xea
github.com/pdfcpu/pdfcpu/pkg/api.PDFInfo({0x16ad518, 0xc000012530}, {0x7ff7bfeffad1, 0x3c}, {0x0, 0x0, 0x0}, 0xc000193a88?)
	./pkg/api/info.go:42 +0xad
github.com/pdfcpu/pdfcpu/pkg/cli.ListInfoFile({0x7ff7bfeffad1, 0x3c}, {0x0, 0x0, 0x0}, 0x10b412c?)
	./pkg/cli/list.go:279 +0x10f
github.com/pdfcpu/pdfcpu/pkg/cli.ListInfoFiles({0xc0001de2f0?, 0x1, 0x104a312?}, {0x0, 0x0, 0x0}, 0xe0?, 0x10c945e?)
	./pkg/cli/list.go:345 +0x233
github.com/pdfcpu/pdfcpu/pkg/cli.ListInfo(0x1549e60?)
	./pkg/cli/cli.go:193 +0x45
github.com/pdfcpu/pdfcpu/pkg/cli.Process(0xc000024580)
	./pkg/cli/process.go:35 +0xba
main.process(0xc0001de2f0?)
	./cmd/pdfcpu/process.go:149 +0x1d
main.processInfoCommand(0xc0001a1930)
	./cmd/pdfcpu/process.go:1441 +0x40a
main.commandMap.process(0xc00008c058?, {0x7ff7bfeffacc, 0x4}, {0x0, 0x0})
	./cmd/pdfcpu/cmd.go:143 +0x342
main.main()
	./cmd/pdfcpu/main.go:56 +0xaf

Reviewing the PDF Reference Manual, Version 1.1, I see the following relevant pieces...

Page 76 (about the Border annotation attribute) it describes 1.1 introducing the fourth, array element. In annotation.go it looks like that is currently supported, but the code expects an array of exactly 2 items. Interestingly, the example given in the manual is exactly what the sample file uses:

An example of a border with a dash array is [ 0 0 1 [ 3 ] ].

Page 147 formally describes the setdash operator (of which the array is the optional, fourth border element).

Sets the dash pattern parameter in the graphics state. If array is empty, the dash pattern is a solid, unbroken line, otherwise array is an array of numbers, all non-negative and at least one non-zero, that specifies distances in user space for the length of dashes and gaps. phase is a number that specifies a distance in user space into the dash pattern at which to begin marking the path. The default dash pattern is a solid line.

Page 144 gives several examples of single-item arrays which state its just equivalent on/off spans.

  • ------------- from [] 0 as turn dash off -- solid line
  • --- --- - from [3] 0 as 3 units on, 3 units off, ...
  • - -- -- -- from [2] 1 as 1 on, 2 off, 2 on, 2 off, ...
  • -- -- -- -- - from [2 1] 0 as 2 on, 1 off, 2 on, 1 off, ...
  • [others omitted]

The descriptions of array are a little ambiguous about the maximum number of items and texts suggest the array should simply be cycled through for dash/spacing. But I can only find examples of 0-, 1-, and 2-length arrays (including in PDF 1.7 / 32000-1:2008 reference).

I know this package doesn't rasterize, but, for what its worth, from a Mac the annotation(s) were evaluated as follows:

  • Adobe Acrobat -- renders a dashed green border
  • Preview (OS-native) -- renders a solid green border (seems like it always renders link borders solid?)
  • Firefox (120.0.1) embedded viewer -- renders a dashed green border
  • Chrome (120.0.6099.129) embedded viewer -- renders no border (seems like it never renders link borders?)

Proposal

Change validation to the following:

diff --git a/pkg/pdfcpu/validate/annotation.go b/pkg/pdfcpu/validate/annotation.go
index 5ba27b7..91de8cf 100644
--- a/pkg/pdfcpu/validate/annotation.go
+++ b/pkg/pdfcpu/validate/annotation.go
@@ -1408,7 +1408,7 @@ func validateBorderArray(xRefTable *model.XRefTable, a types.Array) bool {
        if !ok {
                return xRefTable.ValidationMode == model.ValidationRelaxed
        }
-       if len(a1) != 2 {
+       if len(a1) > 2 {
                return false
        }

Which then allows the info calls to succeed:

$ go run ./cmd/pdfcpu info "$TMPDIR/sample.pdf" | grep Page   
          Page count: 49
           Page size: 595.00 x 842.00 points

I'm not too familiar with other ways this might affect the codebase, but I didn't see potential side effects from a quick look.

It's not clear to me if/when ValidationRelaxed should be respected, but it may be an alternative for this condition, too.

I saw the project prefers issues before pull requests, but pushed the one-liner to a branch if you want to merge it as is. If you have some guidance on how you'd test this, I'd be happy to update the branch and/or send a formal PR.

Miscellaneous

  • Thank you for this package. I originally found it when searching for a library that offered low-level, and spec-compliant models that I could build on and extend for analysis tools.
  • validate: invalid border array #711 -- semi-recent border validation change, but caused by non-compliant annotations

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions