-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Closed
Closed
Copy link
Labels
arrowChanges to the arrow crateChanges to the arrow crateenhancementAny new improvement worthy of a entry in the changelogAny new improvement worthy of a entry in the changelogparquetChanges to the parquet crateChanges to the parquet crate
Description
This is a follow up on #7878
The variant spec states the string values in the metadata dictionary must be UTF-8 encoded strings.
We do this check here:
arrow-rs/parquet-variant/src/variant/metadata.rs
Lines 250 to 252 in 387490a
| // Verify the string values in the dictionary are UTF-8 encoded strings. | |
| let value_buffer = | |
| string_from_slice(self.bytes, 0, self.first_value_byte as _..self.bytes.len())?; |
Since we offer simdutf8 as an optional dependency in other crates, we could do the same when performing the validation above. See @Dandandan's comment.
The rough idea being:
If simdutf8 is supported, do:
let value_str = simdutf8::basic::from_utf8(value_buffer)?;else, default to the existing implementation
Metadata
Metadata
Assignees
Labels
arrowChanges to the arrow crateChanges to the arrow crateenhancementAny new improvement worthy of a entry in the changelogAny new improvement worthy of a entry in the changelogparquetChanges to the parquet crateChanges to the parquet crate