I’ve noticed that TOON still seems to use a few extra tokens in some cases - specifically when handling smaller nested structures that include nested arrays with single keys.
Example
Input
{
"data": {
"metadata": {
"items": ["a", "b"]
}
}
}
Current Output
data:
metadata:
items[2]: a,b
Proposed Output
data.metadata.items[2]: a,b
The idea here is to squeeze out a few more token savings in some edge cases, following how OmegaConf handles dotted keys. Any time a dictionary has more than one keys, we can fall back to the current implementation.
TOON is definitely an exceptional work, just giving my two cents.