Skip to content

Commit a543b3e

Browse files
authored
Add IDBFS persistent storage under PostMessage communication channel (#445)
* Handle IDBFS persistent storage under PostMessage * Resolve syncfs() only once async process completes * Add webr::syncfs() * Update IDBFS & syncfs documentation
1 parent 0c5a90f commit a543b3e

15 files changed

+229
-11
lines changed

NEWS.md

+2
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,8 @@
66

77
* The capturing mechanism of `captureR()` has been updated so that memory reallocation is performed when outputting very long lines. If reallocation is not possible (e.g. the environment does not have enough free memory to hold the entire line), the previous behaviour of truncating the line output is maintained (#434).
88

9+
* Enabled the Emscripten IDBFS virtual filesystem driver. This filesystem type can be used to persist data in web browser storage across page reloads. This filesystem type must be used with the `PostMessage` communication channel (#56, #442).
10+
911
## Breaking changes
1012

1113
* The `ServiceWorker` communication channel has been deprecated. Users should use the `SharedArrayBuffer` channel where cross-origin isolation is possible, or otherwise use the `PostMessage` channel. For the moment the `ServiceWorker` channel can still be used, but emits a warning at start up. The channel will be removed entirely in a future version of webR.

R/Makefile

+1-1
Original file line numberDiff line numberDiff line change
@@ -147,7 +147,7 @@ MAIN_LDFLAGS += -s EXIT_RUNTIME=1
147147
MAIN_LDFLAGS += -s ERROR_ON_UNDEFINED_SYMBOLS=0
148148
MAIN_LDFLAGS += -s EXPORTED_RUNTIME_METHODS=$(EXPORTED_RUNTIME_METHODS)
149149
MAIN_LDFLAGS += -s FETCH=1
150-
MAIN_LDFLAGS += -lworkerfs.js -lnodefs.js
150+
MAIN_LDFLAGS += -lworkerfs.js -lnodefs.js -lidbfs.js
151151
MAIN_LDFLAGS += $(FORTRAN_WASM_LDADD)
152152
MAIN_LDFLAGS += $(WASM_OPT_LDADD)
153153

packages/webr/DESCRIPTION

+1-1
Original file line numberDiff line numberDiff line change
@@ -17,4 +17,4 @@ Imports:
1717
Encoding: UTF-8
1818
LazyData: true
1919
Roxygen: list(markdown = TRUE)
20-
RoxygenNote: 7.2.3
20+
RoxygenNote: 7.3.1

packages/webr/NAMESPACE

+1
Original file line numberDiff line numberDiff line change
@@ -14,6 +14,7 @@ export(mount)
1414
export(pager_install)
1515
export(require_shim)
1616
export(shim_install)
17+
export(syncfs)
1718
export(test_package)
1819
export(unmount)
1920
useDynLib(webr, .registration = TRUE)

packages/webr/R/library.R

+2
Original file line numberDiff line numberDiff line change
@@ -21,8 +21,10 @@
2121
#' `show_menu` argument. By default, if no global option is set and no argument
2222
#' is provided, the menu will not be shown.
2323
#'
24+
#' @param pkg Character vector of package names
2425
#' @param show_menu Show a menu asking the user if they would like to install
2526
#' the package if it is missing. Defaults to `getOption("webr.show_menu")`.
27+
#' @param ... Other arguments to be passed to `library` and `require`.
2628
#'
2729
#' @export
2830
library_shim <- function(pkg, ..., show_menu = getOption("webr.show_menu")) {

packages/webr/R/mount.R

+26-1
Original file line numberDiff line numberDiff line change
@@ -16,12 +16,19 @@
1616
#' `mountpoint`. This filesystem type can only be used when webR is running
1717
#' under Node.
1818
#'
19+
#' When mounting an Emscripten "idbfs" type filesystem, files will be persisted
20+
#' to or populated from a browser-based IndexedDB database whenever the
21+
#' JavaScript function `Module.FS.syncfs` is invoked. See the Emscripten `IDBFS`
22+
#' documentation for more information. This filesystem type can only be used
23+
#' when webR is running in a web browser and using the `PostMessage`
24+
#' communication channel.
25+
#'
1926
#' @param mountpoint a character string giving the path to a directory to mount
2027
#' onto in the Emscripten virtual filesystem.
2128
#' @param source a character string giving the location of the data source to be
2229
#' mounted.
2330
#' @param type a character string giving the type of Emscripten filesystem to be
24-
#' mounted: "workerfs" or "nodefs".
31+
#' mounted: "workerfs", "nodefs", or "idbfs".
2532
#'
2633
#' @export
2734
mount <- function(mountpoint, source, type = "workerfs") {
@@ -34,6 +41,8 @@ mount <- function(mountpoint, source, type = "workerfs") {
3441
invisible(.Call(ffi_mount_workerfs, base_url, mountpoint))
3542
} else if (tolower(type) == "nodefs") {
3643
invisible(.Call(ffi_mount_nodefs, source, mountpoint))
44+
} else if (tolower(type) == "idbfs") {
45+
invisible(.Call(ffi_mount_idbfs, mountpoint))
3746
} else {
3847
stop(paste("Unsupported Emscripten Filesystem type:", type))
3948
}
@@ -44,3 +53,19 @@ mount <- function(mountpoint, source, type = "workerfs") {
4453
unmount <- function(mountpoint) {
4554
invisible(.Call(ffi_unmount, mountpoint))
4655
}
56+
57+
#' Synchronise the Emscripten virtual filesystem
58+
#'
59+
#' @description
60+
#' Uses the Emscripten filesystem API to synchronise all mounted virtual
61+
#' filesystems with their backing storage, where it exists. The `populate`
62+
#' argument controls the direction of the synchronisation between Emscripten's
63+
#' internal data and the file system's persistent store.
64+
#'
65+
#' @param populate A boolean. When `true`, initialises the filesystem with data
66+
#' from persistent storage. When `false`, writes current filesystem data to
67+
#' the persistent storage.
68+
#' @export
69+
syncfs <- function(populate) {
70+
invisible(.Call(ffi_syncfs, populate))
71+
}

packages/webr/man/library_shim.Rd

+4
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

packages/webr/man/mount.Rd

+8-1
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

packages/webr/man/syncfs.Rd

+19
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

packages/webr/src/init.c

+4
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,8 @@ extern SEXP ffi_dev_canvas_cache(void);
1010
extern SEXP ffi_dev_canvas_destroy(SEXP);
1111
extern SEXP ffi_mount_workerfs(SEXP, SEXP);
1212
extern SEXP ffi_mount_nodefs(SEXP, SEXP);
13+
extern SEXP ffi_mount_idbfs(SEXP);
14+
extern SEXP ffi_syncfs(SEXP);
1315
extern SEXP ffi_unmount(SEXP);
1416

1517
static
@@ -23,6 +25,8 @@ const R_CallMethodDef CallEntries[] = {
2325
{ "ffi_dev_canvas_destroy", (DL_FUNC) &ffi_dev_canvas_destroy, 1},
2426
{ "ffi_mount_workerfs", (DL_FUNC) &ffi_mount_workerfs, 2},
2527
{ "ffi_mount_nodefs", (DL_FUNC) &ffi_mount_nodefs, 2},
28+
{ "ffi_mount_idbfs", (DL_FUNC) &ffi_mount_idbfs, 1},
29+
{ "ffi_syncfs", (DL_FUNC) &ffi_syncfs, 1},
2630
{ "ffi_unmount", (DL_FUNC) &ffi_unmount, 1},
2731
{ NULL, NULL, 0}
2832
};

packages/webr/src/mount.c

+49
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,14 @@
1515
Rf_error("`" #arg "` can't be `NA`."); \
1616
}
1717

18+
#define CHECK_LOGICAL(arg) \
19+
if (!Rf_isLogical(arg) || LENGTH(arg) != 1) { \
20+
Rf_error("`" #arg "` must be a logical."); \
21+
} \
22+
if (LOGICAL(arg)[0] == NA_LOGICAL){ \
23+
Rf_error("`" #arg "` can't be `NA`."); \
24+
}
25+
1826
SEXP ffi_mount_workerfs(SEXP source, SEXP mountpoint) {
1927
#ifdef __EMSCRIPTEN__
2028
CHECK_STRING(source);
@@ -74,6 +82,47 @@ SEXP ffi_mount_nodefs(SEXP source, SEXP mountpoint) {
7482
#endif
7583
}
7684

85+
SEXP ffi_mount_idbfs(SEXP mountpoint) {
86+
#ifdef __EMSCRIPTEN__
87+
CHECK_STRING(mountpoint);
88+
89+
EM_ASM({
90+
// Stop if we're not able to use a IDBFS filesystem object
91+
if (typeof IN_NODE === 'boolean' && IN_NODE === true) {
92+
const msg = Module.allocateUTF8OnStack(
93+
'The `IDBFS` filesystem object can only be used when running in a web browser.'
94+
);
95+
Module._Rf_error(msg);
96+
}
97+
const mountpoint = UTF8ToString($0);
98+
try {
99+
Module.FS.mount(Module.FS.filesystems.IDBFS, {}, mountpoint);
100+
} catch (e) {
101+
let msg = e.message;
102+
if (e.name === "ErrnoError" && e.errno === 10) {
103+
const dir = Module.UTF8ToString($0);
104+
msg = "Unable to mount directory, `" + dir + "` is already mounted.";
105+
}
106+
Module._Rf_error(Module.allocateUTF8OnStack(msg));
107+
}
108+
}, R_CHAR(STRING_ELT(mountpoint, 0)));
109+
110+
return R_NilValue;
111+
#else
112+
Rf_error("Function must be running under Emscripten.");
113+
#endif
114+
}
115+
116+
SEXP ffi_syncfs(SEXP populate) {
117+
#ifdef __EMSCRIPTEN__
118+
CHECK_LOGICAL(populate);
119+
EM_ASM({ Module.FS.syncfs($0, () => {}) }, LOGICAL(populate)[0]);
120+
return R_NilValue;
121+
#else
122+
Rf_error("Function must be running under Emscripten.");
123+
#endif
124+
}
125+
77126
SEXP ffi_unmount(SEXP mountpoint) {
78127
#ifdef __EMSCRIPTEN__
79128
CHECK_STRING(mountpoint);

src/docs/mounting.qmd

+79-4
Original file line numberDiff line numberDiff line change
@@ -10,15 +10,21 @@ The [Emscripten filesystem API](https://emscripten.org/docs/api_reference/Filesy
1010

1111
Mounting images and directories in this way gives the Wasm R process access to arbitrary external data, potentially including datasets, scripts, or R packages [pre-compiled for WebAssembly](building.qmd).
1212

13-
Emscripten's API provides several types of virtual filesystem, but for technical reasons^[Currently, webR blocks in the JavaScript worker thread while it waits for R input to be evaluated. This blocking means that Emscripten filesystems that depend on asynchronous browser APIs, such as [`IDBFS`](https://emscripten.org/docs/api_reference/Filesystem-API.html#filesystem-api-idbfs), do not work.] only the following filesystems are available for use with webR.
13+
Emscripten's API allows for several types of virtual filesystem, depending on the execution environment. The following filesystems are available for use with webR:
1414

1515
| Filesystem | Description | Web Browser | Node.js |
1616
|------|-----|------|------|
17-
| `WORKERFS` | Mount filesystem images. | &#x2705; | &#x2705; |
17+
| `WORKERFS` | Mount Emscripten filesystem images. | &#x2705; | &#x2705;[^workerfs] |
1818
| `NODEFS` | Mount existing host directories. | &#x274C; | &#x2705; |
19+
| `IDBFS` | Browser-based persistent storage using the [IndexedDB API](https://developer.mozilla.org/en-US/docs/Web/API/IndexedDB_API). | &#x2705;[^idbfs] | &#x274C; |
20+
21+
[^workerfs]: Be aware of the current GitHub issue [#328](https://github.com/r-wasm/webr/issues/328).
22+
[^idbfs]: Using the `PostMessage` [communication channel](communication.qmd) only.
1923

2024
## Emscripten filesystem images
2125

26+
Emscripten filesystem images can be mounted using the `WORKERFS` filesystem type.
27+
2228
The [`file_packager`](https://emscripten.org/docs/porting/files/packaging_files.html#packaging-using-the-file-packager-tool) tool, provided by Emscripten, takes in a directory structure as input and produces webR compatible filesystem images as output. The [`file_packager`](https://emscripten.org/docs/porting/files/packaging_files.html#packaging-using-the-file-packager-tool) tool may be invoked from R using the [rwasm](https://r-wasm.github.io/rwasm/) R package:
2329

2430
```{r eval=FALSE}
@@ -105,12 +111,12 @@ See the [Emscripten `FS.mount()` documentation](https://emscripten.org/docs/api_
105111

106112
## Mount an existing host directory
107113

114+
The `NODEFS` filesystem type maps directories that exist on the host machine so that they are accessible in the WebAssembly process.
115+
108116
::: callout-warning
109117
`NODEFS` is only available when running webR under Node.js.
110118
:::
111119

112-
The `NODEFS` filesystem type maps directories that exist on the host machine so that they are accessible in the WebAssembly process.
113-
114120
To mount the directory `./extra` on the virtual filesystem at `/data`, use either the JavaScript or R mount API with the filesystem type set to `"NODEFS"`.
115121

116122
::: {.panel-tabset}
@@ -130,6 +136,75 @@ webr::mount(
130136
)
131137
```
132138

139+
:::
140+
141+
## IndexedDB Filesystem Storage
142+
143+
When using webR in a web browser, an [IndexedDB](https://developer.mozilla.org/en-US/docs/Web/API/IndexedDB_API)-based persistent storage space can be mounted using the `IDBFS` filesystem type.
144+
145+
::: {.callout-warning}
146+
147+
Due to the way webR blocks for input in the worker thread, the `IDBFS` filesystem type **does not work** when using the `SharedArrayBuffer` communication channel. WebR must be configured to use the `PostMessage` communication channel to use `IDBFS` persistent storage.
148+
149+
:::
150+
151+
### Mounting
152+
153+
First, create a directory to contain the IndexedDB filesystem, then use either the JavaScript or R mount API with type `"IDBFS"`.
154+
155+
::: {.panel-tabset}
156+
## JavaScript
157+
158+
``` javascript
159+
await webR.FS.mkdir('/data');
160+
await webR.FS.mount('IDBFS', {}, '/data');
161+
await webR.FS.syncfs(true);
162+
```
163+
164+
## R
165+
```{r eval=FALSE}
166+
dir.create("/data")
167+
webr::mount(mountpoint = "/data", type = "IDBFS")
168+
webr::syncfs(TRUE)
169+
```
133170

134171
:::
135172

173+
After mounting the filesystem using [`mount()`](api/r.html#mount), the [`syncfs()`](api/r.html#syncfs) function should been invoked with its `populate` argument set to `true`. This extra step is **required** to initialise the virtual filesystem with any previously existing data files in the browser's IndexedDB storage. Without it, the filesystem will always be initially mounted as an empty directory.
174+
175+
For more information, see the Emscripten FS API [`IDBFS` and `FS.syncfs()`](https://emscripten.org/docs/api_reference/Filesystem-API.html#filesystem-api-idbfs) documentation.
176+
177+
### Persisting the filesystem to IndexedDB
178+
179+
The `syncfs()` function should be invoked with its `populate` argument set to `false` to persist the current state of the filesystem to the browser's IndexedDB storage.
180+
181+
::: {.panel-tabset}
182+
## JavaScript
183+
184+
``` javascript
185+
await webR.FS.syncfs(false);
186+
```
187+
188+
## R
189+
```{r eval=FALSE}
190+
webr::syncfs(FALSE)
191+
```
192+
193+
:::
194+
195+
After writing to the virtual filesystem you should be sure to invoke `syncfs(false)` before the web page containing webR is closed to ensure that the filesystem data is flushed and written to the IndexedDB-based persistent storage.
196+
197+
::: {.callout-warning}
198+
199+
Operations performed using IndexedDB are done asynchronously. If you are mounting `IDBFS` filesystems and accessing data non-interactively you should use the JavaScript API and be sure to wait for the `Promise` returned by `webR.FS.syncfs(false)` to resolve before continuing, for example by using the `await` keyword.
200+
201+
In a future version of webR the `webr::syncfs()` function will similarly return a Promise-like object.
202+
:::
203+
204+
### Web storage caveats
205+
206+
Filesystem data stored in an IndexedDB database can only be accessed within the current [origin](https://developer.mozilla.org/en-US/docs/Glossary/Origin), loosely defined as the current web page's host domain and port.
207+
208+
The way in which web browsers decide how much storage space to allocate for data and what to remove when limits are reached differs between browsers and is not always simple to calculate. Be aware of browser [storage quotas and eviction criteria](https://developer.mozilla.org/en-US/docs/Web/API/Storage_API/Storage_quotas_and_eviction_criteria) and note that data stored in an `IDBFS` filesystem type is stored only on a "best-effort" basis. It can be removed by the browser at any time, autonomously or by the user interacting through the browser's UI.
209+
210+
In private browsing mode, for example, stored data is usually deleted when the private session ends.

src/webR/webr-chan.ts

+6
Original file line numberDiff line numberDiff line change
@@ -156,6 +156,12 @@ export interface FSMountMessage extends Message {
156156
};
157157
}
158158

159+
/** @internal */
160+
export interface FSSyncfsMessage extends Message {
161+
type: 'syncfs';
162+
data: { populate: boolean };
163+
}
164+
159165
/** @internal */
160166
export interface FSReadFileMessage extends Message {
161167
type: 'readFile';

src/webR/webr-main.ts

+6-1
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ import {
2525
EvalROptions,
2626
FSMessage,
2727
FSMountMessage,
28+
FSSyncfsMessage,
2829
FSReadFileMessage,
2930
FSWriteFileMessage,
3031
InstallPackagesOptions,
@@ -97,7 +98,7 @@ export type FSNode = {
9798
};
9899

99100
/** An Emscripten Filesystem type */
100-
export type FSType = 'NODEFS' | 'WORKERFS';
101+
export type FSType = 'NODEFS' | 'WORKERFS' | 'IDBFS';
101102

102103
/**
103104
* Configuration settings to be used when mounting Filesystem objects with
@@ -474,6 +475,10 @@ export class WebR {
474475
const msg: FSMountMessage = { type: 'mount', data: { type, options, mountpoint } };
475476
await this.#chan.request(msg);
476477
},
478+
syncfs: async (populate: boolean): Promise<void> => {
479+
const msg: FSSyncfsMessage = { type: 'syncfs', data: { populate } };
480+
await this.#chan.request(msg);
481+
},
477482
readFile: async (path: string, flags?: string): Promise<Uint8Array> => {
478483
const msg: FSReadFileMessage = { type: 'readFile', data: { path, flags } };
479484
const payload = await this.#chan.request(msg);

0 commit comments

Comments
 (0)