Skip to content

Dataframe v2: new and improved chunk tools#7649

Merged
teh-cmc merged 2 commits intomainfrom
cmc/dataframev2_chunk_tools
Oct 10, 2024
Merged

Dataframe v2: new and improved chunk tools#7649
teh-cmc merged 2 commits intomainfrom
cmc/dataframev2_chunk_tools

Conversation

@teh-cmc
Copy link
Copy Markdown
Contributor

@teh-cmc teh-cmc commented Oct 9, 2024

Bunch of improvements and/or additions to the Chunk toolbox that happened as part of the implementation of the dataframe v2 API.

Checklist

  • I have read and agree to Contributor Guide and the Code of Conduct
  • I've included a screenshot or gif (if applicable)
  • I have tested the web demo (if applicable):
  • The PR title and labels are set such as to maximize their usefulness for the next release's CHANGELOG
  • If applicable, add a new check to the release checklist!
  • If have noted any breaking changes to the log API in CHANGELOG.md and the migration guide

To run all checks from main, comment on the PR with @rerun-bot full-check.

@teh-cmc teh-cmc added enhancement New feature or request 🔍 re_query affects re_query itself include in changelog labels Oct 9, 2024
/// WARNING: the returned chunk has the same old [`crate::ChunkId`]! Change it with [`Self::with_id`].
#[must_use]
#[inline]
pub fn components_removed(self) -> Self {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

chunk.without_components() reads more intuitively to me but I don't feel strongly

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying (hard) to keep to the seemingly de-facto arrow standard of using past participles (I think that's what they're called?) for methods that take ownership, filter and return a new one.


/// Applies a [take] kernel to the [`Chunk`] as a whole.
///
/// In release builds, indices are allowed to have null entries (they will be taken as `null`s).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What are the situations that cause us to query with null indices? Seems like returning a ChunkResult here and always making that an error condition would be preferable.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't, but this is technically part of the public Rust API, so I don't want to punish end users trying to do something that is perfectly valid and apparently well accepted in the broader ecosystem (whether its panics or results, they're both extremely annoying in these filter chains).

@teh-cmc teh-cmc merged commit ab69022 into main Oct 10, 2024
@teh-cmc teh-cmc deleted the cmc/dataframev2_chunk_tools branch October 10, 2024 07:20
teh-cmc added a commit that referenced this pull request Oct 10, 2024
Support clear semantics in the dataframe API.
Tombstones are never visible to end-users, only their effect.

Like every other Dataframe v2 feature PR, and following recommendations
from @jleibs, this prioritizes convenience of implementation over
everything else, for now.
All clear chunks are fetched, post-processed, and re-injected into the
view contents during init(), and then the streaming join runs as usual
after that.

Static clear semantics can get pretty unhinged, but that's A) not
specific to the dataframe API and B) so extremely niche that our time is
better spent on real-world problems right now:
- #7650 
- #7631 

---

- Fixes #7495
- Fixes #7414
- Fixes #7468
- Fixes #7493
- DNM: requires #7649
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request 🔍 re_query affects re_query itself

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants