How to create a PDF/A file with LATEX
Jarmo Niemel̈a ([Link]@[Link])
29th October 2021
Contents
1 Introduction 1
2 Creating a PDF/A file 2
3 Checking and validating a PDF/A file 4
4 Recommendations and problems 4
5 PDF/A conversion with other programs 6
1 Introduction
PDF/A is a standardised version of the pdf file format, and it is intended
for archiving and long-term preservation of electronic documents. A PDF/A
file must contain all the information needed for displaying and printing the
document. This includes text, graphics, fonts and colour information. Audio
and video content and encryption are forbidden.
There are four versions of the PDF/A standard: PDF/A-1, PDF/A-2,
PDF/A-3 and PDF/A-4. Versions 1–3 specify conformance levels a and b.
Level a (accessible) meets all requirements for the standard. Level b (basic)
conformance requires only that the document’s visual appearance is pre-
served. PDF/A-2 and PDF/A-3 contain a third level u (Unicode), which ex-
pands conformance level b with an additional requirement that all text in
the document has Unicode mapping. PDF/A-4 does not use conformance
levels a, b and u but defines a new level f corresponding to version PDF/A-3.
In general, PDF/A-4 does not have as strict requirements as the previous
versions.
PDF/A format is required in Tampere University theses. Any PDF/A
version and conformance level can be used.
1
2 Creating a PDF/A file
Below is a LATEX template for creating a PDF/A compliant file:
\begin{filecontents*}[overwrite]{\[Link]}
\Title{Document’s title}
\Author{Author’s name}
\Language{en-GB}
\Subject{The abstract or short description.}
\Keywords{keyword1\sep keyword2\sep keyword3}
\end{filecontents*}
\documentclass[a4paper,12pt]{article}
\usepackage[utf8]{inputenc}
\usepackage[UKenglish]{babel}
\usepackage{colorprofiles}
\usepackage[a-2b,mathxmp]{pdfx}[2018/12/22]
\hypersetup{pdfstartview=}
\begin{document}
The text of the document goes here.
\end{document}
The pdfx macro package
Use the pdfx package to create a PDF/A file from the LATEX source:
\usepackage[a-2b,mathxmp]{pdfx}[2018/12/22]
The option a-2b selects PDF/A version and conformance level. The default
value is a-1b. Always use the option mathxmp because it allows mathematical
symbols in metadata, and it also corrects some errors in presenting metadata
in pdf viewers. The option [2018/12/22] at the end of the command means
that no version of the package older than the specified date is accepted. The
pdfx macro package still has some shortcomings and problems, and there are
more of these in the older versions.
The pdfx package loads hyperref and xcolor packages, among others, so
you don’t have to call these explicitly. You can set hyperref options using the
\hypersetup command.
Colour profile
The colorprofiles package takes care of the colour profile required by the
PDF/A format. The pdfx package loads colorprofiles automatically. However,
pdfx will not call colorprofiles if it is not installed. So it must be installed first.
MiKTeX usually installs the missing packages automatically when needed,
in which case it is sufficient to add the command
\usepackage{colorprofiles}
2
before calling the pdfx package. After the colorprofiles package is installed,
this command is no longer needed.
Metadata
The PDF/A standard requires that document metadata, such as the docu-
ment’s title, author’s name and keywords, are embedded in a specified format.
The pdfx package reads the metadata from a text file \[Link],
where the command \jobname contains the base name of the document’s
main LATEX file, for example [Link]. Usually, the most con-
venient way is to include the \[Link] file at the beginning of the
main file, within a filecontents* environment:
\begin{filecontents*}[overwrite]{\[Link]}
\Title{Document’s title}
\Author{Author’s name}
\Language{en-GB}
\Subject{The abstract or short description.}
\Keywords{keyword1\sep keyword2\sep keyword3}
\end{filecontents*}
If your version of LATEX was released before October 2019, the overwrite op-
tion of the filecontents* environment does not work. In that case, replace
the command
\begin{filecontents*}[overwrite]{\[Link]}
with commands
\RequirePackage{filecontents}
\begin{filecontents*}{\[Link]}
All the supported metadata fields are listed in the documentation of the pdfx
package. None of the metadata fields are mandatory, but it is recommended to
specify at least the fields \Title, \Author and \Language. Multiple authors,
keywords and languages should be separated by the command \sep.
Within the metadata fields, you can type LATEX’s reserved characters
$, &, #, _, ~, ^, { and } as themselves, but use the commands \% and
\textbackslash for the characters % and \. This implies that, for example,
superscripts and subscripts can only be represented with the corresponding
Unicode characters.
If the metadata fields contain mathematical symbols, either as Unicode
characters or represented with LATEX’s commands, you must use the mathxmp
option of the pdfx package. Mathematical symbols must not be surrounded
with dollar signs because these would show in the metadata as such.
3
3 Checking and validating a PDF/A file
After the PDF/A file is ready, check its text, images, bookmarks, links and
metadata (File → Properties → Description) in Adobe Acrobat Reader or in
PDF-XChange Editor. Lastly, you must ensure that the file actually complies
with the PDF/A standard. Do this validation always before archiving PDF/A
files. In particular, before submitting your thesis for evaluation.
You can validate PDF/A files with the following programs:
3-Heights PDF-validator online tool is a free online tool for validating
PDF/A documents.
veraPDF is a free open source PDF/A validator. veraPDF requires the Java
Runtime Environment. A quick start guide: [Link]/gui.
Adobe Acrobat Pro’s Preflight tool (Tools → PDF Standards → Pre-
flight) can validate PDF/A documents and repair possible validation
errors. Acrobat Pro is commercial software, but a 7-day trial version is
available.
Callas pdfaPilot can validate and repair PDF/A files in a similar manner
as Adobe Acrobat Pro. Callas pdfaPilot is commercial software, but a
14-day trial version is available.
Sometimes these validation programs may give different results. Usually, it is
sufficient that the file validates with one of the programs listed above. Thus
if a pdf file passes the validation tests of one program, you do not have to
validate it with other programs. If a pdf file fails the validation tests of one
program, you can test it with another program.
4 Recommendations and problems
Bookmarks
Check the document’s section titles in Adobe Reader’s or PDF-XChange
Editor’s bookmarks. Avoid using mathematical symbols in section titles be-
cause mathematical symbols and other special characters may not show up
correctly in bookmarks. In many cases, you can correct these with the com-
mand \texorpdfstring{}{}, whose second argument is used in the book-
marks. For example,
\section{The integral \texorpdfstring
{$\displaystyle\int\sqrt{x^2 - a^2}\,dx$}
{\int\textsurd(x\texttwosuperior \textminus
a\texttwosuperior)\unichar{"2009}dx}}
The second argument of \texorpdfstring can contain Unicode characters,
either copied from somewhere or entered with the \unichar{"XXXX} com-
mand, where XXXX is the character’s hexadecimal code in uppercase.
4
Included pdf files
Imported pdf graphics files and other included pdfs must have their fonts
embedded. Otherwise, the pdf file produced from the LATEX document does
not contain those fonts and will not pass PDF/A validation. The included
pdf files do not have to be in PDF/A format. Embed the fonts with the same
programs with which the included pdf files were made. If this is not possible,
you can embed the fonts with the free Ghostscript program1 . The required
command in Windows command prompt is
"C:\Program Files\gs\gs9.54.0\bin\[Link]" -dBATCH
-dNOPAUSE -sDEVICE=pdfwrite -dAutoRotatePages=/None
-sOutputFile=[Link] -c "<</NeverEmbed [ ]>>
setdistillerparams" -f [Link]
The file path at the beginning of the command depends on the version of
Ghostscript. If you need anything other than Ghostscript’s default fonts, you
can give the list of font directories with the option -sFONTPATH. For example,
-sFONTPATH="C:/Windows/Fonts".
Alternatively, you can make an ordinary pdf file with LATEX and convert it
to PDF/A format with another program. The missing fonts will be embedded
automatically during the conversion.
Limitations of the pdfx package
The current version (v1.6.3) of the pdfx package cannot generate level a (1a,
2a, 3a) conforming PDF/A files.
PdfLaTeX cannot produce a valid PDF/A file if one of the font-defining
macro packages arev, kpfonts or mathdesign is used along with pdfx. How-
ever, this problem does not exist with XeLaTeX. You don’t have to modify
the LATEX code when switching from pdfLaTeX to XeLaTeX.
The pdfx package expects that XeLaTeX is used with the optional argu-
ment -shell-escape. Because using -shell-escape is potentially a security
risk, it is better to create a new processing tool than to change XeLaTeX’s
default options. You can do this in the TeXworks editor in the following way:
1. Select Edit → Preferences → Typesetting.
2. Click the plus button next to Processing tools.
3. Create a new processing tool ”XeLaTeX shell-escape” according to fig-
ure 1, when using MiKTeX, and according to figure 2, when using TeX
Live.
1
You will also need the Ghostscript fonts: [Link]
files/latest/download. Copy the fonts folder to Ghostscript’s installation directory.
5
Figure 1: Adding the option -shell-escape in MiKTeX.
Figure 2: Adding the option -shell-escape in TeX Live.
The option pdfstartview of the hyperref package selects the pdf viewer’s
startup page view. The values FitH, FitV, FitR, FitBH and FitBR of this
option will cause a validation error in 3-Heights PDF validator. These val-
ues should therefore not be used. If the option is given without a value as
pdfstartview=, then the pdf viewer’s default setting is used as the startup
page view.
5 PDF/A conversion with other programs
If you cannot create a valid PDF/A file with LATEX, you must use another
program for the conversion. In any case, write the document metadata with
LATEX. You don’t necessarily need the pdfx package for this because you can
save the metadata with the hyperxmp and hyperref packages too:
\documentclass[a4paper,12pt]{article}
\usepackage[utf8]{inputenc}
\usepackage[UKenglish]{babel}
6
\usepackage{hyperxmp}
\usepackage{hyperref}
\hypersetup{%
pdftitle={Dokument’s title},
pdfauthor={Author’s name},
pdflang={en-GB},
pdfsubject={The abstract or short description.},
pdfkeywords={keyword1, keyword2, keyword3},
pdfstartview=}
\begin{document}
The text of the document goes here.
\end{document}
If the metadata contains LATEX’s reserved characters, you must write them
with the commands \%, \$, \& etc. Include mathematical symbols as Unicode
characters.
Muuntaja service
The students and staff of Tampere University can create PDF/A files with
the Muuntaja service at [Link]. Write the document metadata first
with LATEX.
Adobe Acrobat Pro
With Adobe Acrobat Pro you can convert pdf files to PDF/A format: Tools
→ PDF Standards → Save as PDF/A. You can repair validation errors with
the Preflight tool (Tools → PDF Standards → Preflight). In Preflight, select
the Profiles view, then choose the desired conversion profile from the PDF/A
group, and finally click the Analyze and fix button in the lower right corner
of the window.
Callas pdfaPilot
Callas pdfaPilot can create and repair PDF/A files in a similar manner as
Adobe Acrobat Pro. Editing document metadata with Callas pdfaPilot is
laborious, so write the metadata with LATEX instead.
Other programs
There are several online services for converting pdf-files to PDF/A format.
At least Free PDF Online and PDFen seem to work well.