Skip to content

Revamp data loading and deprecate decode functions#5671

Merged
laurmaedje merged 2 commits intomainfrom
revamp-data-loading
Jan 9, 2025
Merged

Revamp data loading and deprecate decode functions#5671
laurmaedje merged 2 commits intomainfrom
revamp-data-loading

Conversation

@laurmaedje
Copy link
Member

@laurmaedje laurmaedje commented Jan 8, 2025

This PR refactors how files are loaded by the various path-taking functions. With these changes, all functions that support paths today support bytes in addition for full flexibility. The existing .decode functions are deprecated effectively immediately.

API Changes

  • image, cbor, csv, json, toml, xml, and yaml now support a path string or bytes and their .decode variants are deprecated.
  • pdf.embed always take a path (since it's needed for the PDF either way) and optionally bytes (in which case the path will not be read from). Its .decode variant is removed without deprecation since it wasn't released yet.
  • plugin, bibliography, bibliography.style, cite.style, raw.theme, and raw.syntaxes now accept bytes in addition to a path string (some also accept an array of any mix of the two). These did not have a .decode variant, so this adds new flexibility.

Notes

  • csv.decode accepted str | bytes as data, the new way with just csv will always interpret a string as a path. If you already have a string, you need to cast it to bytes to make it explicit that it's the payload, not a path. This cast is very cheap thanks to More flexible and efficient Bytes representation #5670.
  • The deprecations are only in the docs so far, there are no warnings yet. I plan to deal with that separately, alongside Make it possible to deprecate constants #5582.
  • The argument names were changed from path to source or sources in various places to account for the bytes case. This is a slight breaking change as the field name also changes on the elements.
  • A new DataSource enum is introduced to abstract over a path or bytes.
  • The new Derived<S, D> type introduces a new way to parse data at element construction time without introducing ugly #[internal] fields.
  • The new OneOrMultiple<T> type is used to handle one or multiple paths/bytes (e.g. for bibliography or syntaxes) and also is used in few other places.
  • The new ManuallyHash<T> type is useful to have non-hashable fields in structs without having to implement Hash manually.
  • Path autocompletions were extended to handle a few cases that were missing so far.
  • Image format autodetection now has basic support for SVG.
  • The details around syntect SyntaxSet construction slightly changed (there is now one syntax set per user syntax because they are now parsed eagerly and the syntect API kinda forces us to), but I don't expect this to have visible behaviour (still noting it here in case I'm wrong).
  • Yes, you can now generate WASM bytes at runtime. Have fun writing a JIT in Typst.

Issues

@laurmaedje laurmaedje force-pushed the revamp-data-loading branch from ab936bb to 3b74d4c Compare January 8, 2025 22:22
@laurmaedje laurmaedje added this pull request to the merge queue Jan 9, 2025
Merged via the queue into main with commit e2b37fe Jan 9, 2025
12 checks passed
@laurmaedje laurmaedje deleted the revamp-data-loading branch January 9, 2025 09:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

1 participant