Add example of converting RecordBatches to JSON objects#5364
Add example of converting RecordBatches to JSON objects#5364alamb merged 4 commits intoapache:masterfrom
Conversation
arrow-json/src/writer.rs
Outdated
| //! let a = Int32Array::from(vec![1, 2, 3]); | ||
| //! let batch = RecordBatch::try_new(Arc::new(schema), vec![Arc::new(a)]).unwrap(); | ||
| //! | ||
| //! let json_rows: Vec<Map<String, Value>> = todo!("How do we do this?"); |
There was a problem hiding this comment.
@tustvold can you help / point me at code that does what you are thinking of so I can update the example?
I couldn't immediately see how to apply the suggestion you are making
There was a problem hiding this comment.
You can "parse" a serialized JSON string into a RawValue, this allows embedding it into existing serde flows without paying additional decoding overheads. There is no way to obtain a Value, other than to parse the serialized JSON string, this is expected. If this is insufficient for people's use-cases I would suggest they file a ticket with their requirements
There was a problem hiding this comment.
Got it -- I will try and update the example to show reparsing the string to Json value with a note about performance.
There was a problem hiding this comment.
Given I am still very confused about how the RawValue api fits in here (perhaps because as you hint, there is no clear usecase), I am going to remove mention from the docs to avoid confusion.
I wonder if people potentially were using the json_serde values as an intermediate representation to map RecordBatches to their own data structures via serde 🤔
Maybe we can point them to the https://crates.io/crates/serde_arrow crate for that usecase 🤔
There was a problem hiding this comment.
Say you have a larger JSON document you want to embed the arrow data into, you could parse into RawValue in order to embed it. That's the major use-case I can think of
I wonder if people potentially were using the json_serde values as an intermediate representation to map RecordBatches to their own data structures via serde
I guess we shall find out 😅
There was a problem hiding this comment.
I made a PR to serde_arrow with an example of how to use that crate to make arrow arrays out of rust structs: chmp/serde_arrow#131
So now I feel quite good about directing people there ❤️
| //! | ||
| //! ``` | ||
| //! # use std::sync::Arc; | ||
| //! # use arrow_array::{Int32Array, RecordBatch}; |
There was a problem hiding this comment.
I am not sure how much value this example has, to be honest, other than to demonstrate feature parity with previous releases
There was a problem hiding this comment.
I similarly am not immensely convinced of its utility
There was a problem hiding this comment.
FWIW I did put this example at the end of the docs, so hopefully it is minimally confusing
Which issue does this PR close?
Related to #5318
Rationale for this change
@tustvold deprecated
record_batches_to_json_rowsin #5318 but there are at least two of us (https://github.com/apache/arrow-rs/pull/5318/files#r1460432887) who are not quite sure how to use the existing APIs or suggested APIs to achieve the same results.While converting from arrow --> JSON objects may not be ideal for certain usecases, I think it is a common request so we shouldn't cause users trouble if they were using it
Thus I think adding an example to show that #5318 doesn't regress functionality is warranted
What changes are included in this PR?
Add a doc example about how to convert
RecordBatches toserde_jsonobjectsAre there any user-facing changes?
Better examples