Skip to content

Releases: coregx/gxpdf

v0.8.2

21 May 18:04
bff6bc2

Choose a tag to compare

v0.8.2

Follow-up patch for #74, fixing two additional issues discovered by testing with real-world shipping label PDFs provided by @iv7dev.

Bug Fixes

Per-glyph text merging (wkhtmltopdf)

PDFs generated by wkhtmltopdf 0.12 / Qt 4.8 emit one Tj operator per glyph with Td repositioning between each character. This caused text extraction to produce "D O M I C I L I O" instead of "DOMICILIO".

Fix: New mergeAdjacentElements() collapses consecutive single-glyph TextElements that share the same font, size, and text line into word-level elements. New AssembleText() replaces the naive elem.Text + " " concatenation with spatial gap detection — spaces are inserted only when the horizontal gap exceeds the font size threshold, and newlines on Y-coordinate changes.

Form XObject text extraction (TCPDF)

PDFs generated by TCPDF 5.9 place all page content inside Form XObjects invoked via the Do operator. Since Do was not handled by the text extractor, all pages with CID fonts returned empty strings.

Fix: New processFormXObject() resolves named XObjects from page resources, pushes the XObject's own /Resources dictionary onto a stack (so CID font decoders are correctly loaded from the XObject scope), processes operators recursively, and pops resources on return. An 8-level depth guard prevents infinite recursion on malformed PDFs.

Verified On

PDF Generator Fonts Before After
Andreani shipping label (12 pages) wkhtmltopdf 0.12 Roboto-Medium/Bold, CID TrueType, Identity-H "D O M I C I L I O" "DOMICILIO"
Correo Argentino label (5 pages) TCPDF 5.9 DejaVuSans-Bold/Oblique (subset), CID TrueType, Identity-H "" (empty) "CLASICO\nF.PAGADO 0001571918..."

Stats

  • 5 files changed, +1393 lines (including 42 new tests with real-world PDF fixtures)
  • Zero breaking changes

Install

go get github.com/coregx/[email protected]

v0.8.1

21 May 09:19
899eaa3

Choose a tag to compare

v0.8.1

Patch release fixing CID TrueType font text extraction.

Bug Fix

CID TrueType text extraction now works correctly (#74, reported by @iv7dev)

ExtractTextFromPage() previously returned raw CID glyph IDs (\x01 \x02 \x03) or empty strings for PDFs with CID TrueType fonts using Identity-H encoding — common in documents generated by wkhtmltopdf, TCPDF, and similar tools. Now correctly decodes to Unicode text.

What was fixed

  • beginbfrange array format — ToUnicode CMap entries in [<dst0> <dst1> ...] array form were silently skipped, producing an empty mapping table. Now fully parsed.
  • Identity-H 2-byte detection — all loadFontDecoder fallback paths now correctly enable 2-byte glyph mode for Identity-H/V and Type0 composite fonts.
  • begincodespacerange parsing — byte width is now determined from the PDF's codespace range declaration (the authoritative signal per PDF spec §9.7.5).
  • UTF-16BE surrogate pairs — CMap destinations encoding supplementary Unicode characters (U+10000+) are now correctly decoded via utf16.DecodeRune.
  • Composite font guard — the looksLikeGarbage heuristic no longer downgrades Type0 composite fonts from 2-byte to 1-byte decoding.

Affected PDFs

Any PDF using CID TrueType fonts with Identity-H encoding — this includes most PDFs generated by:

  • wkhtmltopdf (Roboto, DejaVu Sans, etc.)
  • TCPDF (DejaVuSans subsets)
  • Prince XML
  • Many CJK-capable PDF generators

Stats

  • 4 files changed, +891 lines (including 33 new tests)
  • Zero breaking changes

Install

go get github.com/coregx/[email protected]

v0.8.0

07 May 18:44
d91ac2e

Choose a tag to compare

v0.8.0 "Extraction & Access"

Community-driven release delivering three major extraction capabilities requested by @joa23, plus a bug fix from @yurikilian.

Highlights

Positioned Text Extraction (#68)

Extract text with full positional metadata — coordinates, dimensions, font name, and font size:

elements, _ := doc.ExtractTextElementsFromPage(1)
for _, e := range elements {
    fmt.Printf("%q at (%.1f, %.1f) font=%s size=%.1f\n",
        e.Text, e.X, e.Y, e.FontName, e.FontSize)
}

In-Memory PDF Opening (#68)

Open PDFs from []byte without writing to disk — ideal for server-side workflows:

data, _ := io.ReadAll(httpResponse.Body)
doc, _ := gxpdf.OpenFromBytes(data)
defer doc.Close()

Also supports encrypted PDFs: OpenFromBytesWithPassword(data, "secret").

Embedded Font Extraction (#67)

Extract TTF/OTF font binary data for round-trip preservation:

fonts, _ := doc.GetEmbeddedFonts()
for _, f := range fonts {
    fmt.Printf("Font: %s (%s), %d bytes\n", f.Name, f.Subtype, len(f.Data))
}

Supports TrueType and Type0/CIDFontType2 fonts. Extracted data can be fed back into the Creator API via fonts.LoadTTFFromBytes().

Vector Graphics Extraction (#66)

Extract paths, bezier curves, colors, and opacity from parsed PDFs:

paths, _ := doc.GetVectorGraphicsForPage(1)
for _, p := range paths {
    fmt.Printf("Path: %d verbs, mode=%v, stroke=%v\n",
        len(p.Verbs), p.PaintMode, p.StrokeColor)
}

Features:

  • Path verb + coordinates model (MoveTo, LineTo, CubicTo, QuadTo, Close) — compatible with gogpu/gg
  • Graphics state stack (q/Q), CTM tracking (cm), CMYK→RGB conversion
  • Stroke, fill, and fill-stroke paint modes with separate colors and opacity
  • Line cap, line join, miter limit extraction

Bug Fixes

  • uint16 overflow in FontSubset.MeasureString — long strings no longer produce incorrect width calculations, fixing text wrapping in Builder API (#69, @yurikilian)

Internal Changes

  • Parser Reader refactored from *os.File to io.ReadSeeker interface
  • Shared decodeStreamData() utility for FlateDecode decompression
  • New VectorParser separate from table detection GraphicsParser

Stats

  • +4,000 lines of new production code
  • 130 new tests across 6 packages
  • All CI checks passing (Ubuntu, macOS, Windows)

Contributors

Install

go get github.com/coregx/[email protected]

v0.7.0

24 Mar 08:54
eb73536

Choose a tag to compare

v0.7.0 "Builder & Signatures"

The largest feature release in GxPDF history — ~13,500 lines of new code across 3 major features.

Highlights

Declarative Builder API

QuestPDF-inspired document builder with 12-column grid, automatic pagination, and composable Go closures. Users import only builder/ — no internal packages leak.

Enterprise Tables

Full-featured tables with ColSpan, RowSpan, repeating headers on overflow pages, page splitting, Auto/Fixed/Fr column widths, cell padding/borders/backgrounds, and zebra stripes.

Rich Text

Multi-style inline text — mix bold, italic, colors, and links within a single paragraph. Baseline alignment for mixed font sizes, justified text with proportional spacing.

Digital Signatures (PAdES B-B + B-T)

Sign and verify PDFs with zero external dependencies. RSA + ECDSA support, CMS/PKCS#7 with ESS signing-certificate-v2, RFC 3161 timestamping, incremental PDF update preserving existing signatures.

What's New

  • builder/ — Declarative Builder API (12-col grid, Row/Col, Text, Tables, RichText, Images, Lines, Spacers)
  • layout/ — Pure computation layout engine (zero PDF dependencies)
  • signature/ — Digital signatures (PAdES B-B/B-T, sign + verify)
  • Enterprise Tables — ColSpan, RowSpan, header repeat, page split, Auto/Fixed/Pct/Fr columns
  • Rich TextRichText() with Span() and Link() for mixed inline styles
  • Text Measurement APIMeasureText(), FontAscender(), FontLineHeight() exported
  • Builder-owned typesValue, Color, Size (no layout/ import needed)
  • Page break controlPageBreak(), KeepTogether(), EnsureSpace()
  • Page numbers — Two-pass placeholder resolution
  • 13 predefined colors + Hex() parser

Bug Fixes

  • Half-leading for optically centered text in line boxes (CSS model)
  • Pct double-resolution fix in nested box layout
  • Floating-point epsilon in text overflow check

Stats

Package Production Tests Coverage
layout/ ~2,600 ~2,500 85.7%
builder/ ~2,600 ~2,200 80.6%
signature/ 1,886 1,268 80.7%

Full Changelog: v0.6.0...v0.7.0

v0.6.0

24 Feb 23:43
648b20d

Choose a tag to compare

GxPDF v0.6.0 — Encrypted PDF Reading & Gradient Rendering

Two major features in this release: transparent reading of password-protected PDFs and full gradient rendering via PDF Shading dictionaries.

Highlights

  • Encrypted PDF Reading — open password-protected PDFs with Open() or OpenWithPassword() (#34)
  • Full Gradient Rendering — linear and radial gradients now render as real color transitions, not solid colors (#57)
  • ExtGState Fix — shape and text opacity produce valid PDF output (#46, #47)

What's New

Encrypted PDF Reading (#34)

Read PDFs encrypted with Standard Security Handler:

Algorithm Version Status
RC4 40-bit V=1, R=2 Supported
RC4 128-bit V=2, R=3 Supported
AES-128 V=4, R=4 Supported
  • Open() transparently handles empty-password encrypted PDFs (permissions-only — most common case)
  • OpenWithPassword() for PDFs with non-empty user passwords
  • ErrPasswordRequired sentinel error for wrong/missing password
// Permissions-only PDFs open transparently
doc, _ := gxpdf.Open("bank_statement.pdf")

// Password-protected PDFs
doc, err := gxpdf.OpenWithPassword("protected.pdf", "secret")
if errors.Is(err, gxpdf.ErrPasswordRequired) {
    log.Fatal("Wrong password")
}

Full Gradient Rendering (#57)

Gradients now render as real PDF Shading objects instead of solid-color fallback:

  • Linear gradients — ShadingType 2 (axial) with multi-stop support
  • Radial gradients — ShadingType 3 with focal point control
  • Multi-stop — Type 3 stitching functions for 3+ color stops
  • All shapes — rectangles, circles, ellipses, polygons, Bezier curves
grad := creator.NewLinearGradient(50, 650, 250, 650)
grad.AddColorStop(0, creator.Red)
grad.AddColorStop(0.5, creator.Yellow)
grad.AddColorStop(1, creator.Green)

page.DrawRect(50, 620, 200, 60, &creator.RectOptions{
    FillGradient: grad,
})

What's Fixed

  • ExtGState Object Creation — opacity on shapes and text now produces valid PDF indirect objects (#46, #47)

Installation

# CLI tool:
go install github.com/coregx/gxpdf/cmd/[email protected]

# Library:
go get github.com/coregx/[email protected]

Full Changelog: https://github.com/coregx/gxpdf/blob/main/CHANGELOG.md

v0.5.1

23 Feb 18:58
b19b7bd

Choose a tag to compare

v0.5.1 — ExtGState Hotfix

Fixes shape and text opacity producing invalid PDF output (#46, #47).

What was broken

ExtGState objects (used for transparency) were registered in the resource dictionary but never created as actual PDF objects. This caused /GS1 0 0 R references — PDF viewers silently ignored them, making all opacity settings invisible.

What's fixed

  • ExtGState indirect objects are now properly created with correct /ca and /CA opacity values
  • Both text-only and graphics+text rendering paths are fixed
  • All shape types affected: circles, rectangles, ellipses, polygons, lines, Bezier curves
  • Text opacity via AddTextColorAlpha and custom font variants

Upgrade

go get github.com/coregx/[email protected]

v0.5.0

23 Feb 10:39
c934ee6

Choose a tag to compare

v0.5.0 — "Opacity & Bezier"

Third feature release adding transparency controls and quadratic Bezier curves.

Highlights

  • Text Opacity — render semi-transparent text for watermarks and overlays
  • Quadratic Bezier Curves — native quadratic curves with exact cubic conversion
  • Shape Opacity Fix — opacity on shapes now works correctly across all shape types

What's New

Text Opacity (#46)

New methods for rendering text with transparency via ExtGState (/ca, /CA):

page.AddTextColorAlpha("Watermark", 200, 400, creator.Helvetica, 48, creator.Gray, 0.3)

All variants supported:

  • AddTextColorAlpha — standard font + color + opacity
  • AddTextColorRotatedAlpha — with rotation
  • AddTextCustomFontColorAlpha — custom TTF/OTF font
  • AddTextCustomFontColorRotatedAlpha — custom font + rotation

Quadratic Bezier Curves (#45)

DrawQuadBezierCurve with multi-segment paths:

segments := []creator.QuadBezierSegment{
    {
        Start:   creator.Point{X: 100, Y: 100},
        Control: creator.Point{X: 150, Y: 200},
        End:     creator.Point{X: 250, Y: 100},
    },
}
page.DrawQuadBezierCurve(segments, &creator.BezierOptions{
    Color: creator.Blue, Width: 2.0,
})

Converts to cubic via exact degree elevation (PDF only supports cubic natively). seg.ToCubic() available for manual conversion. Full styling: stroke, fill, dash, opacity, gradients.

What's Fixed

Shape Opacity (#47)

The Opacity field on shape option structs (circles, ellipses, rectangles, polygons, polylines, lines, Bezier curves) was silently dropped during the creator→writer conversion. Fixed by propagating opacity through the full pipeline: convertOptionswriter.GraphicsOp → ExtGState gs operator.

Full Changelog

https://github.com/coregx/gxpdf/blob/main/CHANGELOG.md#050---2026-02-23-opacity--bezier

v0.4.0

21 Feb 07:47
1e84281

Choose a tag to compare

v0.4.0 — Creator API

Focused on Creator API enhancements requested by the community (#41, #42).

Highlights

  • 35+ built-in page sizes — ISO A/B/C, ANSI, photo, book, JIS, envelopes, presentation slides
  • Custom page dimensionsNewPageWithDimensions(widthPt, heightPt) with unit helpers
  • Landscape orientationNewPageWithSize(A4, Landscape) using swapped-MediaBox
  • Text rotationAddTextRotated with both standard 14 and custom TTF fonts
  • Angle normalization — negative and out-of-range angles normalized to [0, 360)

New Features

Page Sizes & Dimensions

// 35+ built-in sizes with IDE autocomplete
page, _ := c.NewPageWithSize(creator.A4)
page, _ := c.NewPageWithSize(creator.Slide16x9)   // 960×540 pt (PowerPoint widescreen)
page, _ := c.NewPageWithSize(creator.USTradeBook)  // 6×9 inches

// Custom dimensions with unit conversion
page, _ := c.NewPageWithDimensions(creator.InchesToPoints(8.5), creator.InchesToPoints(11))
page, _ := c.NewPageWithDimensions(creator.MMToPoints(200), creator.MMToPoints(300))

// Landscape orientation (swapped MediaBox, not /Rotate)
page, _ := c.NewPageWithSize(creator.A4, creator.Landscape)
page, _ := c.NewPageWithSize(creator.Letter, creator.Landscape)

Text Rotation

// Vertical text (90° counter-clockwise, PDF convention)
page.AddTextRotated("Sidebar", 50, 400, creator.Helvetica, 14, 90)

// With color
page.AddTextColorRotated("DRAFT", 300, 400, creator.HelveticaBold, 48, creator.Red, 45)

// Custom TTF font + rotation
page.AddTextCustomFontRotated("サイドバー", 50, 400, myFont, 14, 90)

// Negative angles normalized: -90 → 270, both produce identical output
page.AddTextRotated("Text", 100, 400, creator.Helvetica, 14, 270)  // same as -90

Unit Conversion Helpers

creator.InchesToPoints(8.5)    // 612 pt
creator.MMToPoints(210)        // ~595 pt
creator.CMToPoints(21)         // ~595 pt
creator.PointsToInches(612)    // 8.5 in
creator.PointsToMM(595)       // ~210 mm

What's Changed

  • 35+ page sizes with map-based architecture (single source of truth)
  • Orientation type with Portrait/Landscape constants
  • NewPageWithDimensions for arbitrary page sizes
  • AddTextRotated, AddTextColorRotated for standard 14 fonts
  • AddTextCustomFontRotated, AddTextCustomFontColorRotated for TTF/OTF fonts
  • Angle normalization to [0, 360) per ISO 32000 §8.3
  • Reverse unit conversions: PointsToInches, PointsToMM, PointsToCM
  • Fix: use fmt.Fprintf instead of WriteString(Sprintf) (staticcheck QF1012)

Contributors

Thanks to @ajstarks for the feature requests and API feedback that shaped this release.

Full Changelog: v0.3.0...v0.4.0

v0.3.0

16 Feb 17:11

Choose a tag to compare

GxPDF v0.3.0 "Parser Hardening"

Major parser robustness improvements, rendering fixes, and developer experience enhancements.

Highlights

  • Logging Package — slog-based configurable logging (silent by default)
  • Image RenderingDrawImage() / DrawImageFit() now produce visible images (JPEG + PNG + alpha)
  • Watermark Rendering — text watermarks with rotation and opacity
  • Error Propagation — public API no longer silently swallows errors
  • Parser Hardening — 11 community PRs fixing edge cases and a DoS vulnerability

What's New

Logging (logging/ package)

  • logging.SetLogger() / logging.Logger() API
  • Silent by default — opt-in via any slog.Handler
  • Convenience methods (ExtractText, ExtractTables, GetImages) log errors via slog

Image XObject Rendering (fixes #36)

  • DrawImage() and DrawImageFit() now work correctly in Writer
  • JPEG via /Filter /DCTDecode, PNG via /Filter /FlateDecode
  • Alpha channel support via /SMask

Watermark Rendering

  • Text watermarks with rotation, opacity, and font support
  • ExtGState for transparency

What's Fixed

Error Propagation (fixes #35)

  • ExtractTextFromPage() now returns actual errors instead of empty strings
  • All convenience methods log errors via slog instead of silently discarding them

Parser Robustness (11 PRs by @mikeschinkel)

  • Leading whitespace before %PDF- header
  • CR line endings in startxref
  • Trailing garbage after %%EOF (progressive search)
  • CMap uint16 infinite loop — DoS vulnerability fix
  • Token position after indirect Length
  • Progressive xref stream buffer (1KB → 4KB)
  • /W [0 0 0] in xref streams
  • PNG predictor support — all 5 filter types
  • Off-by-one xref object recovery with lenient parsing

Contributors

Full Changelog

See CHANGELOG.md for details.


Installation:

# CLI tool:
go install github.com/coregx/gxpdf/cmd/[email protected]

# Library:
go get github.com/coregx/[email protected]

# Or download binary for your platform above

Quick Start (CLI):

# Extract tables (100% accuracy!):
gxpdf tables invoice.pdf --format csv

# PDF info:
gxpdf info document.pdf

# Merge PDFs:
gxpdf merge doc1.pdf doc2.pdf -o combined.pdf

Quick Start (Library):

doc, _ := gxpdf.Open("invoice.pdf")
defer doc.Close()

tables := doc.ExtractTables()
for _, t := range tables {
    fmt.Println(t.Rows())
}

Documentation: https://github.com/coregx/gxpdf#readme

Report Issues: https://github.com/coregx/gxpdf/issues

v0.2.1

05 Feb 14:49
f1559f2

Choose a tag to compare

Hotfix: Hybrid-Reference PDF Support

Fixes parsing of MS Word-generated PDFs that use incremental updates with /Prev chain and /XRefStm hybrid cross-reference structure.

Fixed

  • /Prev Chain Support — follow trailer /Prev links to merge all cross-reference sections from incremental updates
  • /XRefStm Support — parse supplementary cross-reference streams in hybrid-reference PDFs
  • Cycle Detection — prevent infinite loops on malformed /Prev chains
  • Depth Limit — cap chain traversal at 100 levels

Details

MS Word (and other editors) save PDFs with incremental updates, producing multiple xref sections linked via /Prev. GxPDF v0.2.0 only read the last section (which may have 0 entries), causing object N not found in xref table errors.

The parser now walks the full chain, merging entries with newer-wins semantics per PDF 1.7 spec Section 7.5.6.

Closes #19

Install

go get github.com/coregx/[email protected]

GxPDF v0.2.1 (2026-02-05T14:48:46Z)

Enterprise-grade PDF library for Go!

  • 100% table extraction accuracy on bank statements
  • PDF merge, split, text extraction
  • AES-256 encryption support

Changelog

🐛 Bug Fixes

  • f5f2a32: fix: support /Prev chain and /XRefStm in hybrid-reference PDFs (@kolkov)

Installation:

# CLI tool:
go install github.com/coregx/gxpdf/cmd/[email protected]

# Library:
go get github.com/coregx/[email protected]

# Or download binary for your platform above

Quick Start (CLI):

# Extract tables (100% accuracy!):
gxpdf tables invoice.pdf --format csv

# PDF info:
gxpdf info document.pdf

# Merge PDFs:
gxpdf merge doc1.pdf doc2.pdf -o combined.pdf

Quick Start (Library):

doc, _ := gxpdf.Open("invoice.pdf")
defer doc.Close()

tables := doc.ExtractTables()
for _, t := range tables {
    fmt.Println(t.Rows())
}

Documentation: https://github.com/coregx/gxpdf#readme

Report Issues: https://github.com/coregx/gxpdf/issues