v0.8.2

@iv7dev

v0.8.2

Follow-up patch for #74, fixing two additional issues discovered by testing with real-world shipping label PDFs provided by @iv7dev.

Bug Fixes

Per-glyph text merging (wkhtmltopdf)

PDFs generated by wkhtmltopdf 0.12 / Qt 4.8 emit one Tj operator per glyph with Td repositioning between each character. This caused text extraction to produce "D O M I C I L I O" instead of "DOMICILIO".

Fix: New mergeAdjacentElements() collapses consecutive single-glyph TextElements that share the same font, size, and text line into word-level elements. New AssembleText() replaces the naive elem.Text + " " concatenation with spatial gap detection — spaces are inserted only when the horizontal gap exceeds the font size threshold, and newlines on Y-coordinate changes.

Form XObject text extraction (TCPDF)

PDFs generated by TCPDF 5.9 place all page content inside Form XObjects invoked via the Do operator. Since Do was not handled by the text extractor, all pages with CID fonts returned empty strings.

Fix: New processFormXObject() resolves named XObjects from page resources, pushes the XObject's own /Resources dictionary onto a stack (so CID font decoders are correctly loaded from the XObject scope), processes operators recursively, and pops resources on return. An 8-level depth guard prevents infinite recursion on malformed PDFs.

Verified On

PDF	Generator	Fonts	Before	After
Andreani shipping label (12 pages)	wkhtmltopdf 0.12	Roboto-Medium/Bold, CID TrueType, Identity-H	`"D O M I C I L I O"`	`"DOMICILIO"`
Correo Argentino label (5 pages)	TCPDF 5.9	DejaVuSans-Bold/Oblique (subset), CID TrueType, Identity-H	`""` (empty)	`"CLASICO\nF.PAGADO 0001571918..."`

Stats

5 files changed, +1393 lines (including 42 new tests with real-world PDF fixtures)
Zero breaking changes

Install

go get github.com/coregx/[email protected]

@iv7dev

v0.8.1

Patch release fixing CID TrueType font text extraction.

Bug Fix

CID TrueType text extraction now works correctly (#74, reported by @iv7dev)

ExtractTextFromPage() previously returned raw CID glyph IDs (\x01 \x02 \x03) or empty strings for PDFs with CID TrueType fonts using Identity-H encoding — common in documents generated by wkhtmltopdf, TCPDF, and similar tools. Now correctly decodes to Unicode text.

What was fixed

beginbfrange array format — ToUnicode CMap entries in [<dst0> <dst1> ...] array form were silently skipped, producing an empty mapping table. Now fully parsed.
Identity-H 2-byte detection — all loadFontDecoder fallback paths now correctly enable 2-byte glyph mode for Identity-H/V and Type0 composite fonts.
begincodespacerange parsing — byte width is now determined from the PDF's codespace range declaration (the authoritative signal per PDF spec §9.7.5).
UTF-16BE surrogate pairs — CMap destinations encoding supplementary Unicode characters (U+10000+) are now correctly decoded via utf16.DecodeRune.
Composite font guard — the looksLikeGarbage heuristic no longer downgrades Type0 composite fonts from 2-byte to 1-byte decoding.

Affected PDFs

Any PDF using CID TrueType fonts with Identity-H encoding — this includes most PDFs generated by:

wkhtmltopdf (Roboto, DejaVu Sans, etc.)
TCPDF (DejaVuSans subsets)
Prince XML
Many CJK-capable PDF generators

Stats

4 files changed, +891 lines (including 33 new tests)
Zero breaking changes

Install

go get github.com/coregx/[email protected]

@yurikilian

v0.8.0 "Extraction & Access"

Community-driven release delivering three major extraction capabilities requested by @joa23, plus a bug fix from @yurikilian.

Highlights

Positioned Text Extraction (#68)

Extract text with full positional metadata — coordinates, dimensions, font name, and font size:

elements, _ := doc.ExtractTextElementsFromPage(1)
for _, e := range elements {
    fmt.Printf("%q at (%.1f, %.1f) font=%s size=%.1f\n",
        e.Text, e.X, e.Y, e.FontName, e.FontSize)
}

In-Memory PDF Opening (#68)

Open PDFs from []byte without writing to disk — ideal for server-side workflows:

data, _ := io.ReadAll(httpResponse.Body)
doc, _ := gxpdf.OpenFromBytes(data)
defer doc.Close()

Also supports encrypted PDFs: OpenFromBytesWithPassword(data, "secret").

Embedded Font Extraction (#67)

Extract TTF/OTF font binary data for round-trip preservation:

fonts, _ := doc.GetEmbeddedFonts()
for _, f := range fonts {
    fmt.Printf("Font: %s (%s), %d bytes\n", f.Name, f.Subtype, len(f.Data))
}

Supports TrueType and Type0/CIDFontType2 fonts. Extracted data can be fed back into the Creator API via fonts.LoadTTFFromBytes().

Vector Graphics Extraction (#66)

Extract paths, bezier curves, colors, and opacity from parsed PDFs:

paths, _ := doc.GetVectorGraphicsForPage(1)
for _, p := range paths {
    fmt.Printf("Path: %d verbs, mode=%v, stroke=%v\n",
        len(p.Verbs), p.PaintMode, p.StrokeColor)
}

Features:

Path verb + coordinates model (MoveTo, LineTo, CubicTo, QuadTo, Close) — compatible with gogpu/gg
Graphics state stack (q/Q), CTM tracking (cm), CMYK→RGB conversion
Stroke, fill, and fill-stroke paint modes with separate colors and opacity
Line cap, line join, miter limit extraction

Bug Fixes

uint16 overflow in FontSubset.MeasureString — long strings no longer produce incorrect width calculations, fixing text wrapping in Builder API (#69, @yurikilian)

Internal Changes

Parser Reader refactored from *os.File to io.ReadSeeker interface
Shared decodeStreamData() utility for FlateDecode decompression
New VectorParser separate from table detection GraphicsParser

Stats

+4,000 lines of new production code
130 new tests across 6 packages
All CI checks passing (Ubuntu, macOS, Windows)

Contributors

@yurikilian — first contribution! (#69)
@joa23 — feature requests that shaped this release (#66, #67, #68)

Install

go get github.com/coregx/[email protected]

v0.7.0 "Builder & Signatures"

The largest feature release in GxPDF history — ~13,500 lines of new code across 3 major features.

Highlights

Declarative Builder API

QuestPDF-inspired document builder with 12-column grid, automatic pagination, and composable Go closures. Users import only builder/ — no internal packages leak.

Enterprise Tables

Full-featured tables with ColSpan, RowSpan, repeating headers on overflow pages, page splitting, Auto/Fixed/Fr column widths, cell padding/borders/backgrounds, and zebra stripes.

Rich Text

Multi-style inline text — mix bold, italic, colors, and links within a single paragraph. Baseline alignment for mixed font sizes, justified text with proportional spacing.

Digital Signatures (PAdES B-B + B-T)

Sign and verify PDFs with zero external dependencies. RSA + ECDSA support, CMS/PKCS#7 with ESS signing-certificate-v2, RFC 3161 timestamping, incremental PDF update preserving existing signatures.

What's New

builder/ — Declarative Builder API (12-col grid, Row/Col, Text, Tables, RichText, Images, Lines, Spacers)
layout/ — Pure computation layout engine (zero PDF dependencies)
signature/ — Digital signatures (PAdES B-B/B-T, sign + verify)
Enterprise Tables — ColSpan, RowSpan, header repeat, page split, Auto/Fixed/Pct/Fr columns
Rich Text — RichText() with Span() and Link() for mixed inline styles
Text Measurement API — MeasureText(), FontAscender(), FontLineHeight() exported
Builder-owned types — Value, Color, Size (no layout/ import needed)
Page break control — PageBreak(), KeepTogether(), EnsureSpace()
Page numbers — Two-pass placeholder resolution
13 predefined colors + Hex() parser

Bug Fixes

Half-leading for optically centered text in line boxes (CSS model)
Pct double-resolution fix in nested box layout
Floating-point epsilon in text overflow check

Stats

Package	Production	Tests	Coverage
`layout/`	~2,600	~2,500	85.7%
`builder/`	~2,600	~2,200	80.6%
`signature/`	1,886	1,268	80.7%

Full Changelog: v0.6.0...v0.7.0

GxPDF v0.6.0 — Encrypted PDF Reading & Gradient Rendering

Two major features in this release: transparent reading of password-protected PDFs and full gradient rendering via PDF Shading dictionaries.

Highlights

Encrypted PDF Reading — open password-protected PDFs with Open() or OpenWithPassword() (#34)
Full Gradient Rendering — linear and radial gradients now render as real color transitions, not solid colors (#57)
ExtGState Fix — shape and text opacity produce valid PDF output (#46, #47)

What's New

Encrypted PDF Reading (#34)

Read PDFs encrypted with Standard Security Handler:

Algorithm	Version	Status
RC4 40-bit	V=1, R=2	Supported
RC4 128-bit	V=2, R=3	Supported
AES-128	V=4, R=4	Supported

Open() transparently handles empty-password encrypted PDFs (permissions-only — most common case)
OpenWithPassword() for PDFs with non-empty user passwords
ErrPasswordRequired sentinel error for wrong/missing password

// Permissions-only PDFs open transparently
doc, _ := gxpdf.Open("bank_statement.pdf")

// Password-protected PDFs
doc, err := gxpdf.OpenWithPassword("protected.pdf", "secret")
if errors.Is(err, gxpdf.ErrPasswordRequired) {
    log.Fatal("Wrong password")
}

Full Gradient Rendering (#57)

Gradients now render as real PDF Shading objects instead of solid-color fallback:

Linear gradients — ShadingType 2 (axial) with multi-stop support
Radial gradients — ShadingType 3 with focal point control
Multi-stop — Type 3 stitching functions for 3+ color stops
All shapes — rectangles, circles, ellipses, polygons, Bezier curves

grad := creator.NewLinearGradient(50, 650, 250, 650)
grad.AddColorStop(0, creator.Red)
grad.AddColorStop(0.5, creator.Yellow)
grad.AddColorStop(1, creator.Green)

page.DrawRect(50, 620, 200, 60, &creator.RectOptions{
    FillGradient: grad,
})

What's Fixed

ExtGState Object Creation — opacity on shapes and text now produces valid PDF indirect objects (#46, #47)

Installation

# CLI tool:
go install github.com/coregx/gxpdf/cmd/[email protected]

# Library:
go get github.com/coregx/[email protected]

Full Changelog: https://github.com/coregx/gxpdf/blob/main/CHANGELOG.md

v0.5.1 — ExtGState Hotfix

Fixes shape and text opacity producing invalid PDF output (#46, #47).

What was broken

ExtGState objects (used for transparency) were registered in the resource dictionary but never created as actual PDF objects. This caused /GS1 0 0 R references — PDF viewers silently ignored them, making all opacity settings invisible.

What's fixed

ExtGState indirect objects are now properly created with correct /ca and /CA opacity values
Both text-only and graphics+text rendering paths are fixed
All shape types affected: circles, rectangles, ellipses, polygons, lines, Bezier curves
Text opacity via AddTextColorAlpha and custom font variants

Upgrade

go get github.com/coregx/[email protected]

v0.5.0 — "Opacity & Bezier"

Third feature release adding transparency controls and quadratic Bezier curves.

Highlights

Text Opacity — render semi-transparent text for watermarks and overlays
Quadratic Bezier Curves — native quadratic curves with exact cubic conversion
Shape Opacity Fix — opacity on shapes now works correctly across all shape types

What's New

Text Opacity (#46)

New methods for rendering text with transparency via ExtGState (/ca, /CA):

page.AddTextColorAlpha("Watermark", 200, 400, creator.Helvetica, 48, creator.Gray, 0.3)

All variants supported:

AddTextColorAlpha — standard font + color + opacity
AddTextColorRotatedAlpha — with rotation
AddTextCustomFontColorAlpha — custom TTF/OTF font
AddTextCustomFontColorRotatedAlpha — custom font + rotation

Quadratic Bezier Curves (#45)

DrawQuadBezierCurve with multi-segment paths:

segments := []creator.QuadBezierSegment{
    {
        Start:   creator.Point{X: 100, Y: 100},
        Control: creator.Point{X: 150, Y: 200},
        End:     creator.Point{X: 250, Y: 100},
    },
}
page.DrawQuadBezierCurve(segments, &creator.BezierOptions{
    Color: creator.Blue, Width: 2.0,
})

Converts to cubic via exact degree elevation (PDF only supports cubic natively). seg.ToCubic() available for manual conversion. Full styling: stroke, fill, dash, opacity, gradients.

What's Fixed

Shape Opacity (#47)

The Opacity field on shape option structs (circles, ellipses, rectangles, polygons, polylines, lines, Bezier curves) was silently dropped during the creator→writer conversion. Fixed by propagating opacity through the full pipeline: convertOptions → writer.GraphicsOp → ExtGState gs operator.

Full Changelog

https://github.com/coregx/gxpdf/blob/main/CHANGELOG.md#050---2026-02-23-opacity--bezier

v0.4.0 — Creator API

Focused on Creator API enhancements requested by the community (#41, #42).

Highlights

35+ built-in page sizes — ISO A/B/C, ANSI, photo, book, JIS, envelopes, presentation slides
Custom page dimensions — NewPageWithDimensions(widthPt, heightPt) with unit helpers
Landscape orientation — NewPageWithSize(A4, Landscape) using swapped-MediaBox
Text rotation — AddTextRotated with both standard 14 and custom TTF fonts
Angle normalization — negative and out-of-range angles normalized to [0, 360)

New Features

Page Sizes & Dimensions

// 35+ built-in sizes with IDE autocomplete
page, _ := c.NewPageWithSize(creator.A4)
page, _ := c.NewPageWithSize(creator.Slide16x9)   // 960×540 pt (PowerPoint widescreen)
page, _ := c.NewPageWithSize(creator.USTradeBook)  // 6×9 inches

// Custom dimensions with unit conversion
page, _ := c.NewPageWithDimensions(creator.InchesToPoints(8.5), creator.InchesToPoints(11))
page, _ := c.NewPageWithDimensions(creator.MMToPoints(200), creator.MMToPoints(300))

// Landscape orientation (swapped MediaBox, not /Rotate)
page, _ := c.NewPageWithSize(creator.A4, creator.Landscape)
page, _ := c.NewPageWithSize(creator.Letter, creator.Landscape)

Text Rotation

// Vertical text (90° counter-clockwise, PDF convention)
page.AddTextRotated("Sidebar", 50, 400, creator.Helvetica, 14, 90)

// With color
page.AddTextColorRotated("DRAFT", 300, 400, creator.HelveticaBold, 48, creator.Red, 45)

// Custom TTF font + rotation
page.AddTextCustomFontRotated("サイドバー", 50, 400, myFont, 14, 90)

// Negative angles normalized: -90 → 270, both produce identical output
page.AddTextRotated("Text", 100, 400, creator.Helvetica, 14, 270)  // same as -90

Unit Conversion Helpers

creator.InchesToPoints(8.5)    // 612 pt
creator.MMToPoints(210)        // ~595 pt
creator.CMToPoints(21)         // ~595 pt
creator.PointsToInches(612)    // 8.5 in
creator.PointsToMM(595)       // ~210 mm

What's Changed

35+ page sizes with map-based architecture (single source of truth)
Orientation type with Portrait/Landscape constants
NewPageWithDimensions for arbitrary page sizes
AddTextRotated, AddTextColorRotated for standard 14 fonts
AddTextCustomFontRotated, AddTextCustomFontColorRotated for TTF/OTF fonts
Angle normalization to [0, 360) per ISO 32000 §8.3
Reverse unit conversions: PointsToInches, PointsToMM, PointsToCM
Fix: use fmt.Fprintf instead of WriteString(Sprintf) (staticcheck QF1012)

Contributors

Thanks to @ajstarks for the feature requests and API feedback that shaped this release.

Full Changelog: v0.3.0...v0.4.0

@mikeschinkel

GxPDF v0.3.0 "Parser Hardening"

Major parser robustness improvements, rendering fixes, and developer experience enhancements.

Highlights

Logging Package — slog-based configurable logging (silent by default)
Image Rendering — DrawImage() / DrawImageFit() now produce visible images (JPEG + PNG + alpha)
Watermark Rendering — text watermarks with rotation and opacity
Error Propagation — public API no longer silently swallows errors
Parser Hardening — 11 community PRs fixing edge cases and a DoS vulnerability

What's New

Logging (`logging/` package)

logging.SetLogger() / logging.Logger() API
Silent by default — opt-in via any slog.Handler
Convenience methods (ExtractText, ExtractTables, GetImages) log errors via slog

Image XObject Rendering (fixes #36)

DrawImage() and DrawImageFit() now work correctly in Writer
JPEG via /Filter /DCTDecode, PNG via /Filter /FlateDecode
Alpha channel support via /SMask

Watermark Rendering

Text watermarks with rotation, opacity, and font support
ExtGState for transparency

What's Fixed

Error Propagation (fixes #35)

ExtractTextFromPage() now returns actual errors instead of empty strings
All convenience methods log errors via slog instead of silently discarding them

Parser Robustness (11 PRs by @mikeschinkel)

Leading whitespace before %PDF- header
CR line endings in startxref
Trailing garbage after %%EOF (progressive search)
CMap uint16 infinite loop — DoS vulnerability fix
Token position after indirect Length
Progressive xref stream buffer (1KB → 4KB)
/W [0 0 0] in xref streams
PNG predictor support — all 5 filter types
Off-by-one xref object recovery with lenient parsing

Contributors

@mikeschinkel — 11 PRs (parser hardening, logging package)

Full Changelog

See CHANGELOG.md for details.

Installation:

# CLI tool:
go install github.com/coregx/gxpdf/cmd/[email protected]

# Library:
go get github.com/coregx/[email protected]

# Or download binary for your platform above

Quick Start (CLI):

# Extract tables (100% accuracy!):
gxpdf tables invoice.pdf --format csv

# PDF info:
gxpdf info document.pdf

# Merge PDFs:
gxpdf merge doc1.pdf doc2.pdf -o combined.pdf

Quick Start (Library):

doc, _ := gxpdf.Open("invoice.pdf")
defer doc.Close()

tables := doc.ExtractTables()
for _, t := range tables {
    fmt.Println(t.Rows())
}

Documentation: https://github.com/coregx/gxpdf#readme

Report Issues: https://github.com/coregx/gxpdf/issues

@kolkov

Hotfix: Hybrid-Reference PDF Support

Fixes parsing of MS Word-generated PDFs that use incremental updates with /Prev chain and /XRefStm hybrid cross-reference structure.

Fixed

/Prev Chain Support — follow trailer /Prev links to merge all cross-reference sections from incremental updates
/XRefStm Support — parse supplementary cross-reference streams in hybrid-reference PDFs
Cycle Detection — prevent infinite loops on malformed /Prev chains
Depth Limit — cap chain traversal at 100 levels

Details

MS Word (and other editors) save PDFs with incremental updates, producing multiple xref sections linked via /Prev. GxPDF v0.2.0 only read the last section (which may have 0 entries), causing object N not found in xref table errors.

The parser now walks the full chain, merging entries with newer-wins semantics per PDF 1.7 spec Section 7.5.6.

Closes #19

Install

go get github.com/coregx/[email protected]

GxPDF v0.2.1 (2026-02-05T14:48:46Z)

Enterprise-grade PDF library for Go!

100% table extraction accuracy on bank statements
PDF merge, split, text extraction
AES-256 encryption support

Changelog

🐛 Bug Fixes

f5f2a32: fix: support /Prev chain and /XRefStm in hybrid-reference PDFs (@kolkov)

Installation:

# CLI tool:
go install github.com/coregx/gxpdf/cmd/[email protected]

# Library:
go get github.com/coregx/[email protected]

# Or download binary for your platform above

Quick Start (CLI):

# Extract tables (100% accuracy!):
gxpdf tables invoice.pdf --format csv

# PDF info:
gxpdf info document.pdf

# Merge PDFs:
gxpdf merge doc1.pdf doc2.pdf -o combined.pdf

Quick Start (Library):

doc, _ := gxpdf.Open("invoice.pdf")
defer doc.Close()

tables := doc.ExtractTables()
for _, t := range tables {
    fmt.Println(t.Rows())
}

Documentation: https://github.com/coregx/gxpdf#readme

Report Issues: https://github.com/coregx/gxpdf/issues

Releases: coregx/gxpdf

v0.8.2

v0.8.2

Bug Fixes

Per-glyph text merging (wkhtmltopdf)

Form XObject text extraction (TCPDF)

Verified On

Stats

Install

Contributors

Uh oh!

v0.8.1

v0.8.1

Bug Fix

What was fixed

Affected PDFs

Stats

Install

Contributors

Uh oh!

v0.8.0

v0.8.0 "Extraction & Access"

Highlights

Positioned Text Extraction (#68)

In-Memory PDF Opening (#68)

Embedded Font Extraction (#67)

Vector Graphics Extraction (#66)

Bug Fixes

Internal Changes

Stats

Contributors

Install

Contributors

Uh oh!

v0.7.0

v0.7.0 "Builder & Signatures"

Highlights

Declarative Builder API

Enterprise Tables

Rich Text

Digital Signatures (PAdES B-B + B-T)

What's New

Bug Fixes

Stats

Uh oh!

v0.6.0

GxPDF v0.6.0 — Encrypted PDF Reading & Gradient Rendering

Highlights

What's New

Encrypted PDF Reading (#34)

Full Gradient Rendering (#57)

What's Fixed

Installation

Uh oh!

v0.5.1

v0.5.1 — ExtGState Hotfix

What was broken

What's fixed

Upgrade

Uh oh!

v0.5.0

v0.5.0 — "Opacity & Bezier"

Highlights

What's New

Text Opacity (#46)

Quadratic Bezier Curves (#45)

What's Fixed

Shape Opacity (#47)

Full Changelog

Uh oh!

v0.4.0

v0.4.0 — Creator API

Highlights

New Features

Page Sizes & Dimensions

Text Rotation

Unit Conversion Helpers

What's Changed

Contributors

Uh oh!

Logging (`logging/` package)