Skip to content

Encoding issue - UTF-16 Character € in ANSI Encoded File #1107

@mdmcconnell

Description

@mdmcconnell

I've updated pdfcpu to today's repository. Running WSL/Ubuntu

When I fill the attached form f8949lt.pdf using:
pdfcpu form fill f8949lt.pdf fields.json filled.pdf
The € character in the filled field causes the field's contents to display as "☒" in Acrobat. It is not a problem to fill other pdfs using this character, nor is it a problem to manually type the character into a field in the form. I can only guess what the issue might be. Probably the file is not expecting non-ANSI characters, and maybe does not have appearance streams generated for UTF-16 (font information indicates all are ANSI encoded). Ordinarily, I think pdfcpu generates appearance streams, but in this case perhaps it does not notice missing characters. Or possibly pdfcpu writes the UTF-16 value, but it maps to a non-printable character in the ANSI appearance stream - though I have no idea why this corrupts all of the characters in the field.

f8949lt.pdf

fields.json

filled.pdf

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions