Releases: coregx/gxpdf
v0.8.2
v0.8.2
Follow-up patch for #74, fixing two additional issues discovered by testing with real-world shipping label PDFs provided by @iv7dev.
Bug Fixes
Per-glyph text merging (wkhtmltopdf)
PDFs generated by wkhtmltopdf 0.12 / Qt 4.8 emit one Tj operator per glyph with Td repositioning between each character. This caused text extraction to produce "D O M I C I L I O" instead of "DOMICILIO".
Fix: New mergeAdjacentElements() collapses consecutive single-glyph TextElements that share the same font, size, and text line into word-level elements. New AssembleText() replaces the naive elem.Text + " " concatenation with spatial gap detection — spaces are inserted only when the horizontal gap exceeds the font size threshold, and newlines on Y-coordinate changes.
Form XObject text extraction (TCPDF)
PDFs generated by TCPDF 5.9 place all page content inside Form XObjects invoked via the Do operator. Since Do was not handled by the text extractor, all pages with CID fonts returned empty strings.
Fix: New processFormXObject() resolves named XObjects from page resources, pushes the XObject's own /Resources dictionary onto a stack (so CID font decoders are correctly loaded from the XObject scope), processes operators recursively, and pops resources on return. An 8-level depth guard prevents infinite recursion on malformed PDFs.
Verified On
| Generator | Fonts | Before | After | |
|---|---|---|---|---|
| Andreani shipping label (12 pages) | wkhtmltopdf 0.12 | Roboto-Medium/Bold, CID TrueType, Identity-H | "D O M I C I L I O" |
"DOMICILIO" |
| Correo Argentino label (5 pages) | TCPDF 5.9 | DejaVuSans-Bold/Oblique (subset), CID TrueType, Identity-H | "" (empty) |
"CLASICO\nF.PAGADO 0001571918..." |
Stats
- 5 files changed, +1393 lines (including 42 new tests with real-world PDF fixtures)
- Zero breaking changes
Install
go get github.com/coregx/[email protected]v0.8.1
v0.8.1
Patch release fixing CID TrueType font text extraction.
Bug Fix
CID TrueType text extraction now works correctly (#74, reported by @iv7dev)
ExtractTextFromPage() previously returned raw CID glyph IDs (\x01 \x02 \x03) or empty strings for PDFs with CID TrueType fonts using Identity-H encoding — common in documents generated by wkhtmltopdf, TCPDF, and similar tools. Now correctly decodes to Unicode text.
What was fixed
beginbfrangearray format — ToUnicode CMap entries in[<dst0> <dst1> ...]array form were silently skipped, producing an empty mapping table. Now fully parsed.- Identity-H 2-byte detection — all
loadFontDecoderfallback paths now correctly enable 2-byte glyph mode for Identity-H/V and Type0 composite fonts. begincodespacerangeparsing — byte width is now determined from the PDF's codespace range declaration (the authoritative signal per PDF spec §9.7.5).- UTF-16BE surrogate pairs — CMap destinations encoding supplementary Unicode characters (U+10000+) are now correctly decoded via
utf16.DecodeRune. - Composite font guard — the
looksLikeGarbageheuristic no longer downgrades Type0 composite fonts from 2-byte to 1-byte decoding.
Affected PDFs
Any PDF using CID TrueType fonts with Identity-H encoding — this includes most PDFs generated by:
- wkhtmltopdf (Roboto, DejaVu Sans, etc.)
- TCPDF (DejaVuSans subsets)
- Prince XML
- Many CJK-capable PDF generators
Stats
- 4 files changed, +891 lines (including 33 new tests)
- Zero breaking changes
Install
go get github.com/coregx/[email protected]v0.8.0
v0.8.0 "Extraction & Access"
Community-driven release delivering three major extraction capabilities requested by @joa23, plus a bug fix from @yurikilian.
Highlights
Positioned Text Extraction (#68)
Extract text with full positional metadata — coordinates, dimensions, font name, and font size:
elements, _ := doc.ExtractTextElementsFromPage(1)
for _, e := range elements {
fmt.Printf("%q at (%.1f, %.1f) font=%s size=%.1f\n",
e.Text, e.X, e.Y, e.FontName, e.FontSize)
}In-Memory PDF Opening (#68)
Open PDFs from []byte without writing to disk — ideal for server-side workflows:
data, _ := io.ReadAll(httpResponse.Body)
doc, _ := gxpdf.OpenFromBytes(data)
defer doc.Close()Also supports encrypted PDFs: OpenFromBytesWithPassword(data, "secret").
Embedded Font Extraction (#67)
Extract TTF/OTF font binary data for round-trip preservation:
fonts, _ := doc.GetEmbeddedFonts()
for _, f := range fonts {
fmt.Printf("Font: %s (%s), %d bytes\n", f.Name, f.Subtype, len(f.Data))
}Supports TrueType and Type0/CIDFontType2 fonts. Extracted data can be fed back into the Creator API via fonts.LoadTTFFromBytes().
Vector Graphics Extraction (#66)
Extract paths, bezier curves, colors, and opacity from parsed PDFs:
paths, _ := doc.GetVectorGraphicsForPage(1)
for _, p := range paths {
fmt.Printf("Path: %d verbs, mode=%v, stroke=%v\n",
len(p.Verbs), p.PaintMode, p.StrokeColor)
}Features:
- Path verb + coordinates model (MoveTo, LineTo, CubicTo, QuadTo, Close) — compatible with gogpu/gg
- Graphics state stack (
q/Q), CTM tracking (cm), CMYK→RGB conversion - Stroke, fill, and fill-stroke paint modes with separate colors and opacity
- Line cap, line join, miter limit extraction
Bug Fixes
- uint16 overflow in FontSubset.MeasureString — long strings no longer produce incorrect width calculations, fixing text wrapping in Builder API (#69, @yurikilian)
Internal Changes
- Parser
Readerrefactored from*os.Filetoio.ReadSeekerinterface - Shared
decodeStreamData()utility for FlateDecode decompression - New
VectorParserseparate from table detectionGraphicsParser
Stats
- +4,000 lines of new production code
- 130 new tests across 6 packages
- All CI checks passing (Ubuntu, macOS, Windows)
Contributors
- @yurikilian — first contribution! (#69)
- @joa23 — feature requests that shaped this release (#66, #67, #68)
Install
go get github.com/coregx/[email protected]v0.7.0
v0.7.0 "Builder & Signatures"
The largest feature release in GxPDF history — ~13,500 lines of new code across 3 major features.
Highlights
Declarative Builder API
QuestPDF-inspired document builder with 12-column grid, automatic pagination, and composable Go closures. Users import only builder/ — no internal packages leak.
Enterprise Tables
Full-featured tables with ColSpan, RowSpan, repeating headers on overflow pages, page splitting, Auto/Fixed/Fr column widths, cell padding/borders/backgrounds, and zebra stripes.
Rich Text
Multi-style inline text — mix bold, italic, colors, and links within a single paragraph. Baseline alignment for mixed font sizes, justified text with proportional spacing.
Digital Signatures (PAdES B-B + B-T)
Sign and verify PDFs with zero external dependencies. RSA + ECDSA support, CMS/PKCS#7 with ESS signing-certificate-v2, RFC 3161 timestamping, incremental PDF update preserving existing signatures.
What's New
builder/— Declarative Builder API (12-col grid, Row/Col, Text, Tables, RichText, Images, Lines, Spacers)layout/— Pure computation layout engine (zero PDF dependencies)signature/— Digital signatures (PAdES B-B/B-T, sign + verify)- Enterprise Tables — ColSpan, RowSpan, header repeat, page split, Auto/Fixed/Pct/Fr columns
- Rich Text —
RichText()withSpan()andLink()for mixed inline styles - Text Measurement API —
MeasureText(),FontAscender(),FontLineHeight()exported - Builder-owned types —
Value,Color,Size(no layout/ import needed) - Page break control —
PageBreak(),KeepTogether(),EnsureSpace() - Page numbers — Two-pass placeholder resolution
- 13 predefined colors +
Hex()parser
Bug Fixes
- Half-leading for optically centered text in line boxes (CSS model)
- Pct double-resolution fix in nested box layout
- Floating-point epsilon in text overflow check
Stats
| Package | Production | Tests | Coverage |
|---|---|---|---|
layout/ |
~2,600 | ~2,500 | 85.7% |
builder/ |
~2,600 | ~2,200 | 80.6% |
signature/ |
1,886 | 1,268 | 80.7% |
Full Changelog: v0.6.0...v0.7.0
v0.6.0
GxPDF v0.6.0 — Encrypted PDF Reading & Gradient Rendering
Two major features in this release: transparent reading of password-protected PDFs and full gradient rendering via PDF Shading dictionaries.
Highlights
- Encrypted PDF Reading — open password-protected PDFs with
Open()orOpenWithPassword()(#34) - Full Gradient Rendering — linear and radial gradients now render as real color transitions, not solid colors (#57)
- ExtGState Fix — shape and text opacity produce valid PDF output (#46, #47)
What's New
Encrypted PDF Reading (#34)
Read PDFs encrypted with Standard Security Handler:
| Algorithm | Version | Status |
|---|---|---|
| RC4 40-bit | V=1, R=2 | Supported |
| RC4 128-bit | V=2, R=3 | Supported |
| AES-128 | V=4, R=4 | Supported |
Open()transparently handles empty-password encrypted PDFs (permissions-only — most common case)OpenWithPassword()for PDFs with non-empty user passwordsErrPasswordRequiredsentinel error for wrong/missing password
// Permissions-only PDFs open transparently
doc, _ := gxpdf.Open("bank_statement.pdf")
// Password-protected PDFs
doc, err := gxpdf.OpenWithPassword("protected.pdf", "secret")
if errors.Is(err, gxpdf.ErrPasswordRequired) {
log.Fatal("Wrong password")
}Full Gradient Rendering (#57)
Gradients now render as real PDF Shading objects instead of solid-color fallback:
- Linear gradients — ShadingType 2 (axial) with multi-stop support
- Radial gradients — ShadingType 3 with focal point control
- Multi-stop — Type 3 stitching functions for 3+ color stops
- All shapes — rectangles, circles, ellipses, polygons, Bezier curves
grad := creator.NewLinearGradient(50, 650, 250, 650)
grad.AddColorStop(0, creator.Red)
grad.AddColorStop(0.5, creator.Yellow)
grad.AddColorStop(1, creator.Green)
page.DrawRect(50, 620, 200, 60, &creator.RectOptions{
FillGradient: grad,
})What's Fixed
- ExtGState Object Creation — opacity on shapes and text now produces valid PDF indirect objects (#46, #47)
Installation
# CLI tool:
go install github.com/coregx/gxpdf/cmd/[email protected]
# Library:
go get github.com/coregx/[email protected]Full Changelog: https://github.com/coregx/gxpdf/blob/main/CHANGELOG.md
v0.5.1
v0.5.1 — ExtGState Hotfix
Fixes shape and text opacity producing invalid PDF output (#46, #47).
What was broken
ExtGState objects (used for transparency) were registered in the resource dictionary but never created as actual PDF objects. This caused /GS1 0 0 R references — PDF viewers silently ignored them, making all opacity settings invisible.
What's fixed
- ExtGState indirect objects are now properly created with correct
/caand/CAopacity values - Both text-only and graphics+text rendering paths are fixed
- All shape types affected: circles, rectangles, ellipses, polygons, lines, Bezier curves
- Text opacity via
AddTextColorAlphaand custom font variants
Upgrade
go get github.com/coregx/[email protected]v0.5.0
v0.5.0 — "Opacity & Bezier"
Third feature release adding transparency controls and quadratic Bezier curves.
Highlights
- Text Opacity — render semi-transparent text for watermarks and overlays
- Quadratic Bezier Curves — native quadratic curves with exact cubic conversion
- Shape Opacity Fix — opacity on shapes now works correctly across all shape types
What's New
Text Opacity (#46)
New methods for rendering text with transparency via ExtGState (/ca, /CA):
page.AddTextColorAlpha("Watermark", 200, 400, creator.Helvetica, 48, creator.Gray, 0.3)All variants supported:
AddTextColorAlpha— standard font + color + opacityAddTextColorRotatedAlpha— with rotationAddTextCustomFontColorAlpha— custom TTF/OTF fontAddTextCustomFontColorRotatedAlpha— custom font + rotation
Quadratic Bezier Curves (#45)
DrawQuadBezierCurve with multi-segment paths:
segments := []creator.QuadBezierSegment{
{
Start: creator.Point{X: 100, Y: 100},
Control: creator.Point{X: 150, Y: 200},
End: creator.Point{X: 250, Y: 100},
},
}
page.DrawQuadBezierCurve(segments, &creator.BezierOptions{
Color: creator.Blue, Width: 2.0,
})Converts to cubic via exact degree elevation (PDF only supports cubic natively). seg.ToCubic() available for manual conversion. Full styling: stroke, fill, dash, opacity, gradients.
What's Fixed
Shape Opacity (#47)
The Opacity field on shape option structs (circles, ellipses, rectangles, polygons, polylines, lines, Bezier curves) was silently dropped during the creator→writer conversion. Fixed by propagating opacity through the full pipeline: convertOptions → writer.GraphicsOp → ExtGState gs operator.
Full Changelog
v0.4.0
v0.4.0 — Creator API
Focused on Creator API enhancements requested by the community (#41, #42).
Highlights
- 35+ built-in page sizes — ISO A/B/C, ANSI, photo, book, JIS, envelopes, presentation slides
- Custom page dimensions —
NewPageWithDimensions(widthPt, heightPt)with unit helpers - Landscape orientation —
NewPageWithSize(A4, Landscape)using swapped-MediaBox - Text rotation —
AddTextRotatedwith both standard 14 and custom TTF fonts - Angle normalization — negative and out-of-range angles normalized to [0, 360)
New Features
Page Sizes & Dimensions
// 35+ built-in sizes with IDE autocomplete
page, _ := c.NewPageWithSize(creator.A4)
page, _ := c.NewPageWithSize(creator.Slide16x9) // 960×540 pt (PowerPoint widescreen)
page, _ := c.NewPageWithSize(creator.USTradeBook) // 6×9 inches
// Custom dimensions with unit conversion
page, _ := c.NewPageWithDimensions(creator.InchesToPoints(8.5), creator.InchesToPoints(11))
page, _ := c.NewPageWithDimensions(creator.MMToPoints(200), creator.MMToPoints(300))
// Landscape orientation (swapped MediaBox, not /Rotate)
page, _ := c.NewPageWithSize(creator.A4, creator.Landscape)
page, _ := c.NewPageWithSize(creator.Letter, creator.Landscape)Text Rotation
// Vertical text (90° counter-clockwise, PDF convention)
page.AddTextRotated("Sidebar", 50, 400, creator.Helvetica, 14, 90)
// With color
page.AddTextColorRotated("DRAFT", 300, 400, creator.HelveticaBold, 48, creator.Red, 45)
// Custom TTF font + rotation
page.AddTextCustomFontRotated("サイドバー", 50, 400, myFont, 14, 90)
// Negative angles normalized: -90 → 270, both produce identical output
page.AddTextRotated("Text", 100, 400, creator.Helvetica, 14, 270) // same as -90Unit Conversion Helpers
creator.InchesToPoints(8.5) // 612 pt
creator.MMToPoints(210) // ~595 pt
creator.CMToPoints(21) // ~595 pt
creator.PointsToInches(612) // 8.5 in
creator.PointsToMM(595) // ~210 mmWhat's Changed
- 35+ page sizes with map-based architecture (single source of truth)
Orientationtype withPortrait/LandscapeconstantsNewPageWithDimensionsfor arbitrary page sizesAddTextRotated,AddTextColorRotatedfor standard 14 fontsAddTextCustomFontRotated,AddTextCustomFontColorRotatedfor TTF/OTF fonts- Angle normalization to [0, 360) per ISO 32000 §8.3
- Reverse unit conversions:
PointsToInches,PointsToMM,PointsToCM - Fix: use
fmt.Fprintfinstead ofWriteString(Sprintf)(staticcheck QF1012)
Contributors
Thanks to @ajstarks for the feature requests and API feedback that shaped this release.
Full Changelog: v0.3.0...v0.4.0
v0.3.0
GxPDF v0.3.0 "Parser Hardening"
Major parser robustness improvements, rendering fixes, and developer experience enhancements.
Highlights
- Logging Package — slog-based configurable logging (silent by default)
- Image Rendering —
DrawImage()/DrawImageFit()now produce visible images (JPEG + PNG + alpha) - Watermark Rendering — text watermarks with rotation and opacity
- Error Propagation — public API no longer silently swallows errors
- Parser Hardening — 11 community PRs fixing edge cases and a DoS vulnerability
What's New
Logging (logging/ package)
logging.SetLogger()/logging.Logger()API- Silent by default — opt-in via any
slog.Handler - Convenience methods (
ExtractText,ExtractTables,GetImages) log errors via slog
Image XObject Rendering (fixes #36)
DrawImage()andDrawImageFit()now work correctly in Writer- JPEG via
/Filter /DCTDecode, PNG via/Filter /FlateDecode - Alpha channel support via
/SMask
Watermark Rendering
- Text watermarks with rotation, opacity, and font support
- ExtGState for transparency
What's Fixed
Error Propagation (fixes #35)
ExtractTextFromPage()now returns actual errors instead of empty strings- All convenience methods log errors via slog instead of silently discarding them
Parser Robustness (11 PRs by @mikeschinkel)
- Leading whitespace before
%PDF-header - CR line endings in
startxref - Trailing garbage after
%%EOF(progressive search) - CMap uint16 infinite loop — DoS vulnerability fix
- Token position after indirect
Length - Progressive xref stream buffer (1KB → 4KB)
/W [0 0 0]in xref streams- PNG predictor support — all 5 filter types
- Off-by-one xref object recovery with lenient parsing
Contributors
- @mikeschinkel — 11 PRs (parser hardening, logging package)
Full Changelog
See CHANGELOG.md for details.
Installation:
# CLI tool:
go install github.com/coregx/gxpdf/cmd/[email protected]
# Library:
go get github.com/coregx/[email protected]
# Or download binary for your platform aboveQuick Start (CLI):
# Extract tables (100% accuracy!):
gxpdf tables invoice.pdf --format csv
# PDF info:
gxpdf info document.pdf
# Merge PDFs:
gxpdf merge doc1.pdf doc2.pdf -o combined.pdfQuick Start (Library):
doc, _ := gxpdf.Open("invoice.pdf")
defer doc.Close()
tables := doc.ExtractTables()
for _, t := range tables {
fmt.Println(t.Rows())
}Documentation: https://github.com/coregx/gxpdf#readme
Report Issues: https://github.com/coregx/gxpdf/issues
v0.2.1
Hotfix: Hybrid-Reference PDF Support
Fixes parsing of MS Word-generated PDFs that use incremental updates with /Prev chain and /XRefStm hybrid cross-reference structure.
Fixed
/PrevChain Support — follow trailer/Prevlinks to merge all cross-reference sections from incremental updates/XRefStmSupport — parse supplementary cross-reference streams in hybrid-reference PDFs- Cycle Detection — prevent infinite loops on malformed
/Prevchains - Depth Limit — cap chain traversal at 100 levels
Details
MS Word (and other editors) save PDFs with incremental updates, producing multiple xref sections linked via /Prev. GxPDF v0.2.0 only read the last section (which may have 0 entries), causing object N not found in xref table errors.
The parser now walks the full chain, merging entries with newer-wins semantics per PDF 1.7 spec Section 7.5.6.
Closes #19
Install
go get github.com/coregx/[email protected]GxPDF v0.2.1 (2026-02-05T14:48:46Z)
Enterprise-grade PDF library for Go!
- 100% table extraction accuracy on bank statements
- PDF merge, split, text extraction
- AES-256 encryption support
Changelog
🐛 Bug Fixes
Installation:
# CLI tool:
go install github.com/coregx/gxpdf/cmd/[email protected]
# Library:
go get github.com/coregx/[email protected]
# Or download binary for your platform aboveQuick Start (CLI):
# Extract tables (100% accuracy!):
gxpdf tables invoice.pdf --format csv
# PDF info:
gxpdf info document.pdf
# Merge PDFs:
gxpdf merge doc1.pdf doc2.pdf -o combined.pdfQuick Start (Library):
doc, _ := gxpdf.Open("invoice.pdf")
defer doc.Close()
tables := doc.ExtractTables()
for _, t := range tables {
fmt.Println(t.Rows())
}Documentation: https://github.com/coregx/gxpdf#readme
Report Issues: https://github.com/coregx/gxpdf/issues