Split metadata tables into separate modules by rshkv · Pull Request #872 · apache/iceberg-rust

rshkv · 2025-01-03T14:44:59Z

Split metadata tables into separate modules.

Context for this is to address #863 (comment) where the point was made that metadata_scan.rs will grow unwieldy if we shove all metadata table implementations in there. Especially as we're going to add extra utilities for those metadata tables.

The structure in this PR is:

inspect/
  metadata_table.rs: contains TableMetadata
  snapshots.rs: contains "snapshots" table
  manifests.rs: contains "manifests" table

In the future this can expand as described in #863 (comment).

rshkv · 2025-01-05T17:57:00Z

cc @liurenjie1024 @Xuanwo

liurenjie1024

Thanks @rshkv for this pr, LGTM! Some issues are tracked in others like #870 , just left one question to discuss.

liurenjie1024 · 2025-01-07T06:29:38Z

crates/iceberg/src/inspect/metadata_table.rs

+/// - <https://iceberg.apache.org/docs/latest/spark-queries/#querying-with-sql>
+/// - <https://py.iceberg.apache.org/api/#inspecting-tables>
+#[derive(Debug)]
+pub struct MetadataTable(Table);


I don't quite understand why we need this data struct here. It seems just a wrapper to provide more api, while just like this:

impl Table { pub fn snapshots(&self) -> SnapshotsTable { ... } pub fn manifests(&self) -> ManifestsTable { ... } }

cc @Xuanwo @sdd @Fokko What do you think?

I don't quite understand why we need this data struct here. It seems just a wrapper to provide more api, while just like this:

Yes, I intend to make the API exposed at Table more organized. For example, users will have:

table.metadata_table().snapshots(); table.metadata_table().manifests();

instead of:

// Could be confused with `table.metadata().snapshots()` which returns `Snapshot`. table.snapshots(); // Verbose and long table.metadata_snapshots_table(); table.metadata_manifests_table();

While I believe we could use better API names, such as table.inspect().snapshots(), the overall structure looks good to me and aligns better with other implementations.

Sounds reasonable to me.

The metadata_table to inspect rename is here: #881

liurenjie1024 · 2025-01-07T06:32:14Z

crates/iceberg/src/inspect/snapshots.rs

+    }
+
+    /// Returns the schema of the snapshots table.
+    pub fn schema(&self) -> Schema {


Same question as #868 . But we could defer this to later issue.

liurenjie1024 · 2025-01-07T06:33:30Z

crates/iceberg/src/inspect/snapshots.rs

+    }
+
+    /// Scans the snapshots table.
+    pub fn scan(&self) -> Result<RecordBatch> {


Same as #870

liurenjie1024

Thanks @rshkv for this pr, LGTM!

liurenjie1024 · 2025-01-07T09:41:34Z

I'll merge this first as it's a pure refactoring.

rshkv force-pushed the wr/metadata-split branch from 10a3d48 to a6017ce Compare January 3, 2025 14:50

rshkv mentioned this pull request Jan 3, 2025

feat: Support metadata table "Entries" #863

Open

rshkv force-pushed the wr/metadata-split branch from a6017ce to e8cabed Compare January 3, 2025 14:52

Split metadata table into separate modules

be77803

rshkv force-pushed the wr/metadata-split branch from e8cabed to be77803 Compare January 5, 2025 19:05

liurenjie1024 reviewed Jan 7, 2025

View reviewed changes

liurenjie1024 mentioned this pull request Jan 7, 2025

Metadata table scans as streams #870

Merged

liurenjie1024 approved these changes Jan 7, 2025

View reviewed changes

liurenjie1024 merged commit 25e8909 into apache:main Jan 7, 2025
16 checks passed

rshkv mentioned this pull request Jan 7, 2025

Rename 'metadata_table' to 'inspect' #881

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Split metadata tables into separate modules#872

Split metadata tables into separate modules#872
liurenjie1024 merged 1 commit intoapache:mainfrom
rshkv:wr/metadata-split

rshkv commented Jan 3, 2025

Uh oh!

rshkv commented Jan 5, 2025

Uh oh!

liurenjie1024 left a comment

Uh oh!

liurenjie1024 Jan 7, 2025

Uh oh!

liurenjie1024 Jan 7, 2025

Uh oh!

Xuanwo Jan 7, 2025 •

edited

Loading

Uh oh!

liurenjie1024 Jan 7, 2025

Uh oh!

rshkv Jan 7, 2025

Uh oh!

liurenjie1024 Jan 7, 2025

Uh oh!

liurenjie1024 Jan 7, 2025

Uh oh!

liurenjie1024 left a comment

Uh oh!

liurenjie1024 commented Jan 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

rshkv commented Jan 3, 2025

Uh oh!

rshkv commented Jan 5, 2025

Uh oh!

liurenjie1024 left a comment

Choose a reason for hiding this comment

Uh oh!

liurenjie1024 Jan 7, 2025

Choose a reason for hiding this comment

Uh oh!

liurenjie1024 Jan 7, 2025

Choose a reason for hiding this comment

Uh oh!

Xuanwo Jan 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

liurenjie1024 Jan 7, 2025

Choose a reason for hiding this comment

Uh oh!

rshkv Jan 7, 2025

Choose a reason for hiding this comment

Uh oh!

liurenjie1024 Jan 7, 2025

Choose a reason for hiding this comment

Uh oh!

liurenjie1024 Jan 7, 2025

Choose a reason for hiding this comment

Uh oh!

liurenjie1024 left a comment

Choose a reason for hiding this comment

Uh oh!

liurenjie1024 commented Jan 7, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Xuanwo Jan 7, 2025 •

edited

Loading