Skip to content

separate decode from string/bytes for all data functions; and encode for json, toml, yaml via serde#1935

Merged
laurmaedje merged 29 commits intotypst:mainfrom
Beiri22:data_deen
Aug 25, 2023
Merged

separate decode from string/bytes for all data functions; and encode for json, toml, yaml via serde#1935
laurmaedje merged 29 commits intotypst:mainfrom
Beiri22:data_deen

Conversation

@Beiri22
Copy link
Contributor

@Beiri22 Beiri22 commented Aug 14, 2023

resolves #1647. This allows for data decoding also from string and bytes + encoding json, toml and yaml. The new functions are within the original scope, e.g.

json();
json.encode();
json.decode();

@Beiri22 Beiri22 mentioned this pull request Aug 21, 2023
@Beiri22
Copy link
Contributor Author

Beiri22 commented Aug 22, 2023

I thought about removing the convert_*-logic out of data.rs into a real Deserialize-Implementation

@laurmaedje
Copy link
Member

I thought about removing the convert_*-logic out of data.rs into a real Deserialize-Implementation

If that works, that'd be great, especially considering the existing Serialize implementation.

@Beiri22
Copy link
Contributor Author

Beiri22 commented Aug 22, 2023

There are a few problems:

  • Serializing Bytes as readable is just Bytes(some size) as a string, probably not what you want.
  • everything serialized as string (..., via repr) will be deserialized as string.
  • supporting $__toml_private_datetime is nice, but not universal ... do we need this?

Maybe we should use some kind of enum tagging for all non-native types (numbers, strings, bool, none)?

@laurmaedje
Copy link
Member

Serializing Bytes as readable is just Bytes(some size) as a string, probably not what you want.

I think that's fine to be honest. You don't really want to serialize a large byte buffer to a JSON array, it will be horribly slow (which is precisely why it's serialized that way, otherwise typst query would fill your whole terminal with image data when you query for a figure). If you need to have binary data in JSON, you can explicitly convert it to an array. Otherwise, just don't serialize binary data in a human-readable format.

everything serialized as string (..., via repr) will be deserialized as string

Maybe we should use some kind of enum tagging for all non-native types (numbers, strings, bool, none)?

We could but I'm not sure it would be good. It is not to be expected that serialize -> deserialize roundtrip of arbitrary Typst values is lossless. I think adding tags would be quite heavy. For plugin usage, you generally want to have your input data expressed mostly in terms of primitives that just work with JSON.

supporting $__toml_private_datetime is nice, but not universal ... do we need this

I am not familiar with the $__toml_private_datetime notation. Is this some serde way to have TOML datetimes deserialize into Typst datetimes? In general, I think it would be unfortunate to regress on that being supported.

@PgBiel
Copy link
Contributor

PgBiel commented Aug 22, 2023

Serializing Bytes as readable is just Bytes(some size) as a string, probably not what you want.

I think that's fine to be honest. You don't really want to serialize a large byte buffer to a JSON array, it will be horribly slow (which is precisely why it's serialized that way, otherwise typst query would fill your whole terminal with image data when you query for a figure). If you need to have binary data in JSON, you can explicitly convert it to an array. Otherwise, just don't serialize binary data in a human-readable format.

With that said (and going on a bit of a tangent), would it make sense to add native BSON conversion to Typst (with the same API as json(), including the proposed functions from this PR)? Perhaps that could be yet another handy tool for plugin creators.

@laurmaedje
Copy link
Member

laurmaedje commented Aug 23, 2023

With that said (and going on a bit of a tangent), would it make sense to add native BSON conversion to Typst (with the same API as json(), including the proposed functions from this PR)? Perhaps that could be yet another handy tool for plugin creators.

I would be ok with that. Although I wonder if bson or bincode is the more "blessed" binary encoding.

@Beiri22
Copy link
Contributor Author

Beiri22 commented Aug 23, 2023

I'm just experimenting with the following:
In JSON/TOML serialize normally; butin yaml, where !tags are supported, insert that type information as tag

JSON

{
  "a": 1,
  "b": [
    1,
    2,
    3.1415,
    "Test"
  ],
  "c": {
    "func": "text",
    "text": "Aha"
  },
  "d": [
    true,
    false
  ],
  "e": null,
  "f": [
    "28.35pt",
    "90deg"
  ]
}

TOML

a=1
b=[
  1,
  2,
  3.1415,
  "Test",
]
d=[
  true,
  false,
]
[c]
func = "text"
text = "Aha"
f = ["28.35pt", "90deg"]

YAML

a: 1
b:
  -1
  -2
  - 3.1415
  - Test
c: !Content
  func: text
  text: Aha
d:
  - true
  - false
e: null
f:
  - !length 28.35pt
  - !angle 90deg

@laurmaedje
Copy link
Member

Interesting, I didn't know of that YAML feature.

@Beiri22
Copy link
Contributor Author

Beiri22 commented Aug 23, 2023

Is there a convenient way to get back from a repr()-string to a Value? I would use eval(), but don't have a World.

Copy link
Member

@laurmaedje laurmaedje left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@PgBiel do you still think that allowing bytes as input to json.decode and friends is problematic? because it's currently implemented that way. one upside is that this way it is more efficient to deserialize plugin output. converting to a string has not only the UTF-8 check but also a full copy from a wonderful prehashed byte buffer to a non-prehashed EcoString.

@laurmaedje
Copy link
Member

Is there a convenient way to get back from a repr()-string to a Value? I would use eval(), but don't have a World.

No. But this would only be useful for the tagged YAML thing anyway right? Because otherwise you can't know whether it was just a normal string. To be honest, I think nobody will use the tagged YAML thing, I'd just skip it and not try to do anything smart with unrepr-ing a string. repr also doesn't always produce evaluatable output.

@PgBiel
Copy link
Contributor

PgBiel commented Aug 23, 2023

@PgBiel do you still think that allowing bytes as input to json.decode and friends is problematic? because it's currently implemented that way. one upside is that this way it is more efficient to deserialize plugin output. converting to a string has not only the UTF-8 check but also a full copy from a wonderful prehashed byte buffer to a non-prehashed EcoString.

I think it's fine (I'm guessing it will check for UTF-8 validity either way). Ship it 🚀

(Although string should still be an option of course)

@PgBiel
Copy link
Contributor

PgBiel commented Aug 23, 2023

With that said (and going on a bit of a tangent), would it make sense to add native BSON conversion to Typst (with the same API as json(), including the proposed functions from this PR)? Perhaps that could be yet another handy tool for plugin creators.

I would be ok with that. Although I wonder if bson or bincode is the more "blessed" binary encoding.

I think we can have support for both in the future if there is demand for that. I just thought BSON (for now) would be more generic and probably easier to use across different languages.

@Beiri22
Copy link
Contributor Author

Beiri22 commented Aug 24, 2023

The discussion might be a different one, but the implementation is related.

@PgBiel
Copy link
Contributor

PgBiel commented Aug 24, 2023

The discussion might be a different one, but the implementation is related.

Of course, but the PR becomes easier to approve and reason about when it's more self-contained.

@laurmaedje
Copy link
Member

I tend to agree that a new format should be a separate PR.

@Beiri22
Copy link
Contributor Author

Beiri22 commented Aug 25, 2023

2 : 1 ok. ;-)

This reverts commit 0a310dd.
This reverts commit d3387ee.
This reverts commit 8da23b7.
This reverts commit f95fa7f.
This reverts commit bdf4bcd.
@Beiri22
Copy link
Contributor Author

Beiri22 commented Aug 25, 2023

Since the separated PR with new format(s) will depend on this, I will place it when this one is done.

@Beiri22
Copy link
Contributor Author

Beiri22 commented Aug 25, 2023

Please consider #1997 to calm clippy down even more and get a test pass...

@laurmaedje laurmaedje merged commit 22b5959 into typst:main Aug 25, 2023
@laurmaedje
Copy link
Member

Thank you!

@Beiri22 Beiri22 deleted the data_deen branch August 25, 2023 12:34
YDX-2147483647 added a commit to YDX-2147483647/typst that referenced this pull request Aug 11, 2025
Resolves typst#6738

- Change `serialize_str(bytes)` from `Debug::fmt` (_Bytes(n)_) to `repr` (_bytes(n)_) and add tests.
- Update docs related to data loading and `repr`.

References:
- Initial discussions in typst#1935
- [serde_json 1.0.138](https://docs.rs/serde_json/1.0.138/serde_json/value/enum.Value.html)
- [serde_yaml 0.8.26](https://docs.rs/serde_yaml/0.8.26/serde_yaml/enum.Value.html)
YDX-2147483647 added a commit to YDX-2147483647/typst that referenced this pull request Aug 11, 2025
Resolves typst#6738

- Change `serialize_str(bytes)` from `Debug::fmt` (_Bytes(n)_) to `repr` (_bytes(n)_) and add tests.
- Update docs related to data loading and `repr`.

References:
- Initial discussions in typst#1935
- [serde_json 1.0.138](https://docs.rs/serde_json/1.0.138/serde_json/value/enum.Value.html)
- [serde_yaml 0.8.26](https://docs.rs/serde_yaml/0.8.26/serde_yaml/enum.Value.html)
YDX-2147483647 added a commit to YDX-2147483647/typst that referenced this pull request Aug 11, 2025
Resolves typst#6738

- Change `serialize_str(bytes)` from `Debug::fmt` (_Bytes(n)_) to `repr` (_bytes(n)_) and add tests.
- Update docs related to data loading and `repr`.

References:
- Initial discussions in typst#1935
- [serde_json 1.0.138](https://docs.rs/serde_json/1.0.138/serde_json/value/enum.Value.html)
- [serde_yaml 0.8.26](https://docs.rs/serde_yaml/0.8.26/serde_yaml/enum.Value.html)
YDX-2147483647 added a commit to YDX-2147483647/typst that referenced this pull request Aug 11, 2025
Resolves typst#6738

- Change `serialize_str(bytes)` from `Debug::fmt` (_Bytes(n)_) to `repr` (_bytes(n)_) and add tests.
- Update docs related to data loading and `repr`.

References:
- Initial discussions in typst#1935
- [serde_json 1.0.138](https://docs.rs/serde_json/1.0.138/serde_json/value/enum.Value.html)
- [serde_yaml 0.8.26](https://docs.rs/serde_yaml/0.8.26/serde_yaml/enum.Value.html)
YDX-2147483647 added a commit to YDX-2147483647/typst that referenced this pull request Aug 11, 2025
Resolves typst#6738

- Update docs related to data loading and `repr`.
- Change `serialize_str(bytes)` from `Debug::fmt` (_Bytes(n)_) to `repr` (_bytes(n)_).
- Narrow the input of `toml.decode` and the output of `toml` from `Value` to `Dict`.

References:
- Initial discussions in typst#1935
- [serde_json 1.0.138](https://docs.rs/serde_json/1.0.138/serde_json/value/enum.Value.html)
- [serde_yaml 0.8.26](https://docs.rs/serde_yaml/0.8.26/serde_yaml/enum.Value.html)
- [toml 0.8.19](https://docs.rs/toml/0.8.19/toml/enum.Value.html)
YDX-2147483647 added a commit to YDX-2147483647/typst that referenced this pull request Aug 11, 2025
Resolves typst#6738

- Update docs related to data loading and `repr`.
- Change `serialize_str(bytes)` from `Debug::fmt` (_Bytes(n)_) to `repr` (_bytes(n)_).
- Narrow the input of `toml.decode` and the output of `toml` from `Value` to `Dict`.

References:
- Initial discussions in typst#1935
- [serde_json 1.0.138](https://docs.rs/serde_json/1.0.138/serde_json/value/enum.Value.html)
- [serde_yaml 0.8.26](https://docs.rs/serde_yaml/0.8.26/serde_yaml/enum.Value.html)
- [toml 0.8.19](https://docs.rs/toml/0.8.19/toml/enum.Value.html)
YDX-2147483647 added a commit to YDX-2147483647/typst that referenced this pull request Aug 11, 2025
Resolves typst#6738

- Update docs related to data loading and `repr`.
- Change `serialize_str(bytes)` from `Debug::fmt` (_Bytes(n)_) to `repr` (_bytes(n)_).
- Narrow the input of `toml.decode` and the output of `toml` from `Value` to `Dict`.

References:
- Initial discussions in typst#1935
- [serde_json 1.0.138](https://docs.rs/serde_json/1.0.138/serde_json/value/enum.Value.html)
- [serde_yaml 0.8.26](https://docs.rs/serde_yaml/0.8.26/serde_yaml/enum.Value.html)
- [toml 0.8.19](https://docs.rs/toml/0.8.19/toml/enum.Value.html)
YDX-2147483647 added a commit to YDX-2147483647/typst that referenced this pull request Aug 11, 2025
Resolves typst#6738

- Update docs related to data loading and `repr`.
- Change `serialize_str(bytes)` from `Debug::fmt` (_Bytes(n)_) to `repr` (_bytes(n)_).
- Narrow the input of `toml.decode` and the output of `toml` from `Value` to `Dict`.

References:
- Initial discussions in typst#1935
- [serde_json 1.0.138](https://docs.rs/serde_json/1.0.138/serde_json/value/enum.Value.html)
- [serde_yaml 0.8.26](https://docs.rs/serde_yaml/0.8.26/serde_yaml/enum.Value.html)
- [toml 0.8.19](https://docs.rs/toml/0.8.19/toml/enum.Value.html)
YDX-2147483647 added a commit to YDX-2147483647/typst that referenced this pull request Aug 11, 2025
Resolves typst#6738

- Update docs related to data loading and `repr`.

  - Add a _Conversion details_ section to each format.

  - Mention that `cbor.encode` uses ciborium, and other implementations may not be able to parse its result.

  - Mention that `*.encode` may fall back to `repr`, and explain common confusions.

  - Fix a few copy-and-paste errors.

  - Use the terms of each format. For instance, JSON object, YAML mapping, TOML table, CBOR map.
    (People are really good at coining names.)

- Change `serialize_str(bytes)` from `Debug::fmt` to `repr`.

  That is, _Bytes(n)_ → _bytes(n)_ for human readable formats (JSON, YAML, TOML).

- Narrow the input of `toml.decode` and the output of `toml` from `Value` to `Dict`.

  Because TOML documents can only be tables.

References:

- Initial discussions in typst#1935
- [serde_json 1.0.138](https://docs.rs/serde_json/1.0.138/serde_json/value/enum.Value.html)
- [serde_yaml 0.8.26](https://docs.rs/serde_yaml/0.8.26/serde_yaml/enum.Value.html)
- [toml 0.8.19](https://docs.rs/toml/0.8.19/toml/enum.Value.html)
- [ciborium 0.2.2](https://docs.rs/ciborium/0.2.2/ciborium/enum.Value.html)
YDX-2147483647 added a commit to YDX-2147483647/typst that referenced this pull request Aug 12, 2025
Resolves typst#6738

- Update docs related to data loading and `repr`.

  - Add a _Conversion details_ section to each format.

  - Mention that `cbor.encode` uses ciborium, and other implementations may not be able to parse its result.

  - Mention that `*.encode` may fall back to `repr`, and explain common confusions.

  - Fix a few copy-and-paste errors.

  - Use the terms of each format. For instance, JSON object, YAML mapping, TOML table, CBOR map.
    (People are really good at coining names.)

- Change `serialize_str(bytes)` from `Debug::fmt` to `repr`.

  That is, _Bytes(n)_ → _bytes(n)_ for human readable formats (JSON, YAML, TOML).

- Narrow the input of `toml.decode` and the output of `toml` from `Value` to `Dict`.

  Because TOML documents can only be tables.

References:

- Initial discussions in typst#1935
- [serde_json 1.0.138](https://docs.rs/serde_json/1.0.138/serde_json/value/enum.Value.html)
- [serde_yaml 0.8.26](https://docs.rs/serde_yaml/0.8.26/serde_yaml/enum.Value.html)
- [toml 0.8.19](https://docs.rs/toml/0.8.19/toml/enum.Value.html)
- [ciborium 0.2.2](https://docs.rs/ciborium/0.2.2/ciborium/enum.Value.html)
YDX-2147483647 added a commit to YDX-2147483647/typst that referenced this pull request Sep 2, 2025
Resolves typst#6738

- Update docs related to data loading and `repr`.

  - Add a _Conversion details_ section to each format.

  - Mention that `cbor.encode` uses ciborium, and other implementations may not be able to parse its result.

  - Mention that `*.encode` may fall back to `repr`, and explain common confusions.

  - Fix a few copy-and-paste errors.

  - Use the terms of each format. For instance, JSON object, YAML mapping, TOML table, CBOR map.
    (People are really good at coining names.)

- Change `serialize_str(bytes)` from `Debug::fmt` to `repr`.

  That is, _Bytes(n)_ → _bytes(n)_ for human readable formats (JSON, YAML, TOML).

- Narrow the input of `toml.decode` and the output of `toml` from `Value` to `Dict`.

  Because TOML documents can only be tables.

References:

- Initial discussions in typst#1935
- [serde_json 1.0.138](https://docs.rs/serde_json/1.0.138/serde_json/value/enum.Value.html)
- [serde_yaml 0.8.26](https://docs.rs/serde_yaml/0.8.26/serde_yaml/enum.Value.html)
- [toml 0.8.19](https://docs.rs/toml/0.8.19/toml/enum.Value.html)
- [ciborium 0.2.2](https://docs.rs/ciborium/0.2.2/ciborium/enum.Value.html)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bidirectional conversion between typst values and json/yaml/toml...

3 participants