Conversation
|
@HDembinski, @jpivarski, please sign off if you like it (you can also suggest a name if you don't like "library"). |
|
I'd be confused by
I looked at JPEG EXIFs, and they use the word XMP (used in formats like PDF) would use In the end, how about |
|
What about |
henryiii
left a comment
There was a problem hiding this comment.
This is what "writer_info" would look like.
|
Do you want to separate out a version field? Otherwise, a single free-text field would get filled inconsistently with versioned and unversioned data, with a space, hyphen, or something else separating the library name from the version number. (It's probably not much of a problem, I'm just asking.) |
|
The current proposal looks like this: {
"writer_info": {
"boost-histogram": {
"version": "1.0.0",
(1)
}
}
(2)
}The version is at |
|
It's also possible that if we put all the format writers here in {
"writer_info": {
"boost-histogram": {
"version": "1.6.0",
},
"uhi": {
"version": "0.6.0",
}
}
}The version of boost-histogram that produced the serialization struct would be recorded, but also the version of uhi that converted that struct in to HDF5 or zip or zarr or whatever could also be recorded. |
42ad8c9 to
570158e
Compare
570158e to
97e124d
Compare
|
I am good with writer_info and the suggested use cases. I see some conflict between the idea of using a structured storage format for what I understand is supposed to be metadata for human consumption. Using a structured format like a dict suggests that the information is designed to be read and used by machines. For pure metadata that is only for human consumption, I would use a string. |
|
This is naturally (one level) structured data1, and doesn't have to be only for human consumption. A utility like uproot-browser could show the library and version number if it's present. And the round trip example requires machine consumption. For example, boost-histogram could record Even if improved round trip support isn't ever added when loading, it's better to allow the format to record this where we can access it if we want to later, rather than changing the format down the road. You could even just manually process the file and detect which axis were originally filled with growth on, for example. It's also generally more space efficient to not combine this into an arbitrary readable string. The important thing is that it's not required to read a histogram so libraries can load each other's histograms. It's basically exactly Footnotes
|
0ad211e to
ba38769
Compare
Signed-off-by: Henry Schreiner <[email protected]>
Update src/uhi/resources/histogram.schema.json Apply suggestions from code review Update tests/resources/reg.json
Signed-off-by: Henry Schreiner <[email protected]>
ba38769 to
e2d36d1
Compare
|
This now shares #162, so strings, numbers, and bools are the only allowed entries. |
|
Okay to go in? Would like at least one okay to proceed, and I want to make progress before the next IRIS-HEP Demo Days, where I'll talk about histogram serialization. |
|
Looks good to me! |
This adds a place for library-specific metadata to be added. This allows the library and version to be recorded in the histogram. It is not required for reading.
We floated around several ideas for this name; I thought of
"vendor"(it's was inspired by the vendor field in CMakePreset.json), and we also considered"header". But since we decided to make the histogram library a key, then"library"seems to be fitting. Open to suggestions, though."writer_info"is another option.