Please see Tweag's pandoc-wasm for an improved implemenation of Pandoc on WebAssembly: https://github.com/tweag/pandoc-wasm
This project used Asterius to compile Pandoc's Haskell code, which has been deprecated in favour of the newer ghc-wasm-meta version of the GHC compiler. Tweag's newer pandoc-wasm project uses the GHC compiler and so has a much better engineering foundation. I recommend that future work on Pandoc for WebAssembly be targeted towards that project, rather than this one.
I have kept the original README.md content below, for historical purposes.
The universal document converter, compiled for WebAssembly and running in the browser.
Demo application: https://georgestagg.github.io/pandoc-wasm
This repository builds an npm package that wraps Pandoc, compiled for WebAssembly using the Asterius Haskell-to-Wasm compiler. A demo application is also included allowing for conversion between various document types.
Warning: Running Pandoc under WebAssembly using this library is fairly fragile at the moment (see the Extra Notes section below for details). Small documents seem to convert well, but there is definitely room for stability improvement when converting larger documents or including images.
First, install the pandoc-wasm package using npm:
npm install --save pandoc-wasm
Import the module, run init() to download the Wasm binary, and convert documents using run():
import { Pandoc } from "pandoc-wasm";
const pandoc = new Pandoc()
pandoc.init().then(async (pandoc) => {
const result = await pandoc.run({
text: "Some input text",
options: { from: "markdown", to: "html" },
});
console.log(result);
});
See the example and src/app directories for more detailed examples.
The Haskell code that powers the run() function is a modified version of Pandoc's own built in pandoc-server code, and takes a similar options object to control how documents are converted. See Pandoc's server documentation for details on the options settings that can be passed to Pandoc.
Supplemental files, such as images, can be included in the argument to the Pandoc .run() function. The files property should be a mapping from paths to file content, encoded either in a Uint8Array or a base64 encoded string:
pandoc.run({
text: "",
options: {
'from': "markdown",
'to': "html",
'embed-resources': true
},
files: {
'images/test.png': "iVBORw0KGgoAAAANSUhEUgAAADAAAAAlAQAAAAAsYlcCAAAACklEQVR4AWMYBQABAwABRUEDtQAAAABJRU5ErkJggg=="
}
});
Pandoc can be run from inside a Web Worker. I recommend the Comlink library as a way to handle communication between the main and worker threads. For an example see the demo application in the src/app directory, which uses this method.
-
Asterius is deprecated in favour of the newer ghc-wasm-meta version of the GHC compiler. Once there is a simple way to use Template Haskell with
ghc-wasm-meta, I'll switch to compiling with that toolchain. -
Pandoc relies on some Haskell libraries that use external C sources (e.g. zlib), which does not work when compiling with Asterius. For those libraries the functionality is instead replicated with JavaScript libraries called using Asterius's JS FFI.
-
Asterius's
--yolomode has been used to avoid GC issues. Smaller documents seem to work OK, but it is easy to trigger "out of memory" errors. This should be less of a problem once we've switched to using ghc-wasm-meta. -
Node/Deno should be possible, but does not work right now due to the way
fetch()is used to download the Pandoc binary and support files. -
No Lua filters right now. I think it should be possible to compile a C Lua interpreter using Emscripten (or something similar), then hook it up to the Pandoc wasm binary through a JS FFI.
A Docker development container has been built containing the required prerequisites for building Pandoc for WebAssembly using Asterius. The following commands are a possible simplified route to get up and running:
git clone https://github.com/georgestagg/pandoc-wasm
cd pandoc-wasm
npm install
make submodules
make docker-container
make
See the included Dockerfile for more details about the pre-built development container.
We use the same general process as used in the above projects to build Pandoc, but this project has the following advantages:
- Provides a newer version of Pandoc.
- Supports more readers and writers, including binary formats such as docx.
- Supports more options and extensions, including parsing YAML headers.
- Supports adding and embedding supplemental files, such as images, to document output.