Skip to content

Conversation

@alamb
Copy link
Contributor

@alamb alamb commented Jun 13, 2025

Which issue does this PR close?

Rationale for this change

While making documentation / examples for working with Variant in #7661, I found it was somewhat awkward to make Variant values directly from the metadata and value. Specifically you have to

let metadata = [0x01, 0x00, 0x00];
let value = [0x09, 0x48, 0x49];
// parse the header metadata
let metadata = VariantMetadata::try_new(&metadata).unwrap();
// and only then can you make the Variant
Variant::try_new(&metadata, &value).unwrap()

I would really like to be able to create Variant directly from metadata and value without having to make a VariantMetadata structure

What changes are included in this PR?

This PR proposes a small change to the API so creating a Variant now looks like:

let metadata = [0x01, 0x00, 0x00];
let value = [0x09, 0x48, 0x49];
// You can now make the Variant directly from the metadata and value
Variant::try_new(&metadata, &value).unwrap()

Are there any user-facing changes?

Yes, the API for creating APIs is slightly different (and I think better)

@github-actions github-actions bot added the parquet Changes to the parquet crate label Jun 13, 2025
#[derive(Clone, Copy, Debug, PartialEq)]
pub struct VariantObject<'m, 'v> {
pub metadata: &'m VariantMetadata<'m>,
pub metadata: VariantMetadata<'m>,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the core difference -- rather than having a reference to a VariantMetadata Objects and Arrays now have the objects inlined (and the objects themselves have references to the underlying &[u8])

Note that VariantMetadata is a pointer and a few usize so I think it is better to inline them anyways rather than have an extra layer of indirection

Copy link
Contributor

@scovich scovich Jun 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One fat pointer, two usize, and a VariantMetadataHeader that should easily pack to one usize. So 40B instead of 16B. That does seem within the range of "probably doesn't matter" when it comes to memory consumption. And this avoids a double pointer indirection when accessing the metadata.

We could also shrink the header size down (by storing the field sizes as i32 for example) 🤔 I'll make a follow on PR to explore what that would look like

@alamb alamb marked this pull request as ready for review June 13, 2025 15:57
@alamb alamb changed the title [Variant] Simplify creation of Variants [Variant] Simplify creation of Variants from metadata and value Jun 13, 2025
@alamb
Copy link
Contributor Author

alamb commented Jun 13, 2025

FYI @scovich @mkarbo @PinkCrow007 @superserious-dev

Copy link
Contributor

@scovich scovich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

#[derive(Clone, Copy, Debug, PartialEq)]
pub struct VariantObject<'m, 'v> {
pub metadata: &'m VariantMetadata<'m>,
pub metadata: VariantMetadata<'m>,
Copy link
Contributor

@scovich scovich Jun 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One fat pointer, two usize, and a VariantMetadataHeader that should easily pack to one usize. So 40B instead of 16B. That does seem within the range of "probably doesn't matter" when it comes to memory consumption. And this avoids a double pointer indirection when accessing the metadata.

We could also shrink the header size down (by storing the field sizes as i32 for example) 🤔 I'll make a follow on PR to explore what that would look like

@alamb
Copy link
Contributor Author

alamb commented Jun 16, 2025

In order to avoid conflicts, I am just going to merge this PR and I will be happy to handle any other suggestions / comments as a follow on PR(s).

I would like to unblock additional documentation and testing

@alamb alamb merged commit 3a15f84 into apache:main Jun 16, 2025
13 checks passed
@alamb alamb deleted the alamb/simpler_variant_api branch June 16, 2025 17:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

parquet Changes to the parquet crate

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants