Skip to content

Feature Request: Font data extraction for round-trip preservation #67

@joa23

Description

@joa23

What's Missing

Currently, ExtractText() returns font name and size for each text element, but there's no way to extract the embedded font data (TrueType/OpenType binary) from the PDF. This means when creating new PDFs that need to match the style of an existing one, you have to substitute with system fonts — which causes text overflow, different kerning, and visual differences.

Proposed API

type EmbeddedFont struct {
    Name     string   // Font name as referenced in content streams
    Family   string   // Font family
    Type     string   // "TrueType", "Type1", "OpenType", "CIDFont"
    Data     []byte   // Raw font binary (TTF/OTF)
    Encoding string   // "WinAnsiEncoding", "Identity-H", etc.
}

func (r *Reader) GetEmbeddedFonts() ([]EmbeddedFont, error)

Why This Matters

Font substitution is one of the top causes of visual differences when recreating PDFs. Having access to the embedded fonts would let developers produce output that closely matches the original — correct metrics, correct kerning, correct layout.

Thank you for the excellent library! 🙏

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions