What's Missing
Currently, ExtractText() returns font name and size for each text element, but there's no way to extract the embedded font data (TrueType/OpenType binary) from the PDF. This means when creating new PDFs that need to match the style of an existing one, you have to substitute with system fonts — which causes text overflow, different kerning, and visual differences.
Proposed API
type EmbeddedFont struct {
Name string // Font name as referenced in content streams
Family string // Font family
Type string // "TrueType", "Type1", "OpenType", "CIDFont"
Data []byte // Raw font binary (TTF/OTF)
Encoding string // "WinAnsiEncoding", "Identity-H", etc.
}
func (r *Reader) GetEmbeddedFonts() ([]EmbeddedFont, error)
Why This Matters
Font substitution is one of the top causes of visual differences when recreating PDFs. Having access to the embedded fonts would let developers produce output that closely matches the original — correct metrics, correct kerning, correct layout.
Thank you for the excellent library! 🙏
What's Missing
Currently,
ExtractText()returns font name and size for each text element, but there's no way to extract the embedded font data (TrueType/OpenType binary) from the PDF. This means when creating new PDFs that need to match the style of an existing one, you have to substitute with system fonts — which causes text overflow, different kerning, and visual differences.Proposed API
Why This Matters
Font substitution is one of the top causes of visual differences when recreating PDFs. Having access to the embedded fonts would let developers produce output that closely matches the original — correct metrics, correct kerning, correct layout.
Thank you for the excellent library! 🙏