Skip to content

Generate tagged PDF documents for accessibility#6619

Merged
laurmaedje merged 52 commits intomainfrom
pdf-accessibility
Oct 2, 2025
Merged

Generate tagged PDF documents for accessibility#6619
laurmaedje merged 52 commits intomainfrom
pdf-accessibility

Conversation

@saecki
Copy link
Member

@saecki saecki commented Jul 16, 2025

This will be the first PR in a larger effort to make PDF documents generated by typst compliant with PDF/UA-1 and more accessible in general. Tagging of PDF documents will be enabled by default to ensure a baseline of accessibility and
the --pdf-standard=ua-1 flag can be used in the CLI to enable further checks demanded by the PDF/UA-1 specification. Note however that this cannot guarantee full compliance, since some parts can only reasonably be validated by a human.

Here is a check list currently used for development. Some of the elements are only here for completeness sake and may not require any implementation:

  • investigate introspector performance due to increased number of tags
  • elements
    • heading
      • initial support
      • don't generate Lbl for numbering (doesn't work well with screen readers)
      • PDF/UA-1 7.4.2 descending heading level check
    • outline
      • generate nested TOCs for different levels of headings
      • mark fill as artifact
      • generate Lbl for numbering
    • table
      • mark table gutters as artifacts
      • generate TH tags for table header rows
      • group THead, TFoot, and TBody row groups.
      • some temporary hacky way to set the table-header-scope on cells
        • add pdf.header-cell and pdf.data-cell
        • add feature flag
      • also hide table summary behind a feature flag
      • ignore repeated headers/footers
      • generate Headers attribute
      • generate BBox
      • fix table.header with multiple rows
      • border and background attributes
      • use last repeated footers
    • figure
      • captions
      • generate BBox
    • image
      • wrap in figure tag if not already inside one
    • title
    • terms
    • lists
    • enumerations
    • bibliography
    • quote
    • ref
    • cite
    • footnotes
    • grid
    • inline
    • shapes, line, path, and curve
    • text layout elements
      • strong, emph
      • underline, overline, strike
      • sub, sup
        • write baseline shift, and lineheight
        • decide what to do with typographic scripts
      • highlight
    • symbol
    • repeat: mark as artifact
    • raw
    • math
      • generate a formula tag
      • allow specifying an alternative description
      • generate BBox
    • place
    • hide
      • don't generate tags
    • links
      • only generate quadpoints for links in PDF/UA mode
        • most pdf readers (even acrobat) don't handle them properly
      • link annotation alt description (Contents attribute) is required
        • generate alt text where applicable
      • better alt description for Location and Position LinkTargets
  • language handling
    • language attributes in tag tree or marked content sequence
    • set document language from first top level set rule
  • fix grid layouting issues due to TableCell and GridCell tags
  • fix overlapping tags generated by PAR grouping rule
  • text handling
    • soft hyphens
    • ensure there is whitespace between words and at the end of lines
    • 14.8.2.3.3 reverse chars
  • structure element nesting
    • proper nesting of grouping elements, BLSE, and ILSE
    • write placement attribute where it diverges from the typst element
    • 14.8.4.2 grouping elements may not contain marked content sequences
  • check the reading order of elements
  • snapshot testing (human readable tag trees)

@LaurenzV
Copy link
Collaborator

Note however that this cannot guarantee full compliance, since some parts can only reasonably be validated by a human.

It probably should be possible to ensure that at least machine checks always pass, right?

@saecki
Copy link
Member Author

saecki commented Jul 16, 2025

It probably should be possible to ensure that at least machine checks always pass, right?

Yeah, I don't see any other option. I think it just needs to be documented.

@saecki saecki force-pushed the pdf-accessibility branch 3 times, most recently from 71a9cbb to b7ccf97 Compare July 24, 2025 16:33
@saecki saecki force-pushed the pdf-accessibility branch 5 times, most recently from 370e6b8 to e5fcc3b Compare August 4, 2025 18:00
@saecki saecki force-pushed the pdf-accessibility branch 2 times, most recently from bd318de to 1bd7617 Compare August 7, 2025 10:00
@saecki saecki force-pushed the pdf-accessibility branch from 8689f04 to 943c621 Compare August 19, 2025 11:09
@Andrew15-5 Andrew15-5 mentioned this pull request Sep 2, 2025
1 task
@saecki saecki force-pushed the pdf-accessibility branch 4 times, most recently from 79d35c5 to 62f05ec Compare September 10, 2025 11:53
reknih added a commit that referenced this pull request Sep 11, 2025
This guide is intended as a companion for the work in #6619. It explains to the user what they must pay attention to in order to create an accessible file.
@saecki saecki force-pushed the pdf-accessibility branch 2 times, most recently from 55ff761 to fea13eb Compare September 18, 2025 22:18
@Andrew15-5
Copy link
Contributor

BTW, I created Russian translation: #6926.

reknih added a commit that referenced this pull request Sep 23, 2025
This guide is intended as a companion for the work in #6619. It explains to the user what they must pay attention to in order to create an accessible file.
@saecki saecki force-pushed the pdf-accessibility branch 2 times, most recently from c208dab to 020109f Compare September 25, 2025 22:18
@saecki saecki force-pushed the pdf-accessibility branch from 70cea26 to 9b6dfe0 Compare October 2, 2025 16:24
@laurmaedje laurmaedje marked this pull request as ready for review October 2, 2025 16:25
@saecki saecki force-pushed the pdf-accessibility branch from 9b6dfe0 to 562000b Compare October 2, 2025 16:35
saecki added 7 commits October 2, 2025 18:50
`Tagged` elements will be assigned a location and will generate
introspection tags, but they won't be available in the introspector.
All contiguos marked content sequences insdie grouping element are
wrapped inside Span structure elements. And all ILSE inside grouping
element will be assigned block placement.
Ran `cargo-sort-derives` with the following `.sort-derives.toml` config:
```
order = [
    "Debug",
    "Default",
    "Copy",
    "Clone",
    "Eq",
    "PartialEq",
    "Ord",
    "PartialOrd",
    "Hash",
    "Cast",
    "Serialize",
    "Deserialize",
]
```
Instead of:
- `EcoString::to_string`
- `String::from`
- `|s| s.to_string()`
@saecki saecki force-pushed the pdf-accessibility branch from 562000b to b14c08e Compare October 2, 2025 16:50
@laurmaedje laurmaedje merged commit b14c08e into main Oct 2, 2025
16 checks passed
@laurmaedje
Copy link
Member

It is done.

@laurmaedje laurmaedje deleted the pdf-accessibility branch October 2, 2025 17:44
reknih added a commit that referenced this pull request Oct 3, 2025
This guide is intended as a companion for the work in #6619. It explains to the user what they must pay attention to in order to create an accessible file.
reknih added a commit that referenced this pull request Oct 3, 2025
This guide is intended as a companion for the work in #6619. It explains to the user what they must pay attention to in order to create an accessible file.
@Fevol Fevol mentioned this pull request Oct 5, 2025
@lzm0 lzm0 mentioned this pull request Oct 7, 2025
Bugg4 added a commit to Bugg4/typst that referenced this pull request Oct 7, 2025
Add Italian translation for terms introduced in typst#6619.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants