Skip to content

feat: [iceberg] REST catalog support for CometNativeIcebergScan#2895

Merged
mbutrovich merged 15 commits intoapache:mainfrom
mbutrovich:iceberg_rest
Dec 15, 2025
Merged

feat: [iceberg] REST catalog support for CometNativeIcebergScan#2895
mbutrovich merged 15 commits intoapache:mainfrom
mbutrovich:iceberg_rest

Conversation

@mbutrovich
Copy link
Contributor

@mbutrovich mbutrovich commented Dec 13, 2025

Which issue does this PR close?

N/A.

Rationale for this change

REST catalogs return metadata locations that don't physically exist on disk - the metadata is transferred via HTTP. This caused native Iceberg scans to fall back to Spark.

What changes are included in this PR?

Fixed reflection for protected methods:

  • Changed IcebergReflection.getTableMetadata() and getFormatVersion() to use getDeclaredMethod() with setAccessible(true) to access protected methods in BaseTable

Handle missing metadata files:

  • Detect when metadata file doesn't exist (REST catalog case)
  • Fall back to table location for FileIO initialization in iceberg-rust
  • The location is only used to determine storage backend type (local/S3/GCS) - actual metadata comes from the FileScanTasks

REST catalog test infrastructure:

  • Copied RESTCatalogServlet from Iceberg
  • Added Jetty dependencies for embedded REST server
  • Added withRESTCatalog test helper

How are these changes tested?

New test "REST catalog with native Iceberg scan" creates an embedded Jetty server with InMemoryCatalog backend, creates a table via REST API, and verifies native Iceberg scan is used with correct results.

All Iceberg Java tests pass locally with these changes.

@mbutrovich mbutrovich changed the title Iceberg rest feat: [iceberg] REST catalog support for CometNativeIcebergScan Dec 13, 2025
@codecov-commenter
Copy link

codecov-commenter commented Dec 13, 2025

Codecov Report

❌ Patch coverage is 67.85714% with 9 lines in your changes missing coverage. Please review.
✅ Project coverage is 59.41%. Comparing base (f09f8af) to head (7cde53a).
⚠️ Report is 766 commits behind head on main.

Files with missing lines Patch % Lines
...n/scala/org/apache/comet/rules/CometScanRule.scala 55.00% 7 Missing and 2 partials ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##               main    #2895      +/-   ##
============================================
+ Coverage     56.12%   59.41%   +3.28%     
- Complexity      976     1379     +403     
============================================
  Files           119      167      +48     
  Lines         11743    15342    +3599     
  Branches       2251     2545     +294     
============================================
+ Hits           6591     9115    +2524     
- Misses         4012     4943     +931     
- Partials       1140     1284     +144     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment on lines +207 to +208
val opsMethod = table.getClass.getDeclaredMethod("operations")
opsMethod.setAccessible(true)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: maybe add a helper method for this pattern of getDeclaredMethod and setAccessible since this is repeated. Can be in a future PR

Copy link
Member

@andygrove andygrove left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mbutrovich

Copy link
Contributor

@comphead comphead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @mbutrovich

Wondering should we keep credits for the files copied from Iceberg?

@hsiang-c
Copy link
Contributor

hsiang-c commented Dec 15, 2025

LGTM, with this fix, I'm able to see CometIcebergNativeScan while querying Iceberg table from Iceberg REST Catalog

AdaptiveSparkPlan isFinalPlan=false
+- CometHashAggregate
   +- CometExchange SinglePartition, ENSURE_REQUIREMENTS, CometNativeShuffle, [plan_id=62]
      +- CometHashAggregate
         +- CometHashAggregate
            +- CometExchange
               +- CometHashAggregate
                  +- CometProject
                     +- CometIcebergNativeScan

Copy link
Contributor

@hsiang-c hsiang-c left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

* RESTCatalogAdaptor to proxy the REST Spec to any Catalog implementation.
* Modified version of Iceberg's org/apache/iceberg/rest/RESTCatalogServlet.java
*/
public class RESTCatalogServlet extends HttpServlet {
Copy link
Contributor

@hsiang-c hsiang-c Dec 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can, in Maven, depends on iceberg-core using the tests classifier so that you don't have to copy them. @mbutrovich @comphead

Iceberg does it in Gradle by https://github.com/apache/iceberg/blob/main/build.gradle#L439

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll give this a shot, thank you!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm gonna open a follow-up issue for this. I fell into a dependency nightmare again.

import org.apache.iceberg.util.PropertyUtil;

/** Adaptor class to translate REST requests into {@link Catalog} API calls. */
public class RESTCatalogAdapter implements RESTClient {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto, if depending on iceberg-core's test artifacts is okay in the unit test scope, we can avoid this copy.

@mbutrovich mbutrovich merged commit a51b281 into apache:main Dec 15, 2025
171 of 173 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants