feat: [iceberg] REST catalog support for CometNativeIcebergScan#2895
feat: [iceberg] REST catalog support for CometNativeIcebergScan#2895mbutrovich merged 15 commits intoapache:mainfrom
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2895 +/- ##
============================================
+ Coverage 56.12% 59.41% +3.28%
- Complexity 976 1379 +403
============================================
Files 119 167 +48
Lines 11743 15342 +3599
Branches 2251 2545 +294
============================================
+ Hits 6591 9115 +2524
- Misses 4012 4943 +931
- Partials 1140 1284 +144 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
| val opsMethod = table.getClass.getDeclaredMethod("operations") | ||
| opsMethod.setAccessible(true) |
There was a problem hiding this comment.
nit: maybe add a helper method for this pattern of getDeclaredMethod and setAccessible since this is repeated. Can be in a future PR
comphead
left a comment
There was a problem hiding this comment.
Thanks @mbutrovich
Wondering should we keep credits for the files copied from Iceberg?
|
LGTM, with this fix, I'm able to see CometIcebergNativeScan while querying Iceberg table from Iceberg REST Catalog AdaptiveSparkPlan isFinalPlan=false
+- CometHashAggregate
+- CometExchange SinglePartition, ENSURE_REQUIREMENTS, CometNativeShuffle, [plan_id=62]
+- CometHashAggregate
+- CometHashAggregate
+- CometExchange
+- CometHashAggregate
+- CometProject
+- CometIcebergNativeScan |
| * RESTCatalogAdaptor to proxy the REST Spec to any Catalog implementation. | ||
| * Modified version of Iceberg's org/apache/iceberg/rest/RESTCatalogServlet.java | ||
| */ | ||
| public class RESTCatalogServlet extends HttpServlet { |
There was a problem hiding this comment.
I think we can, in Maven, depends on iceberg-core using the tests classifier so that you don't have to copy them. @mbutrovich @comphead
Iceberg does it in Gradle by https://github.com/apache/iceberg/blob/main/build.gradle#L439
There was a problem hiding this comment.
I'll give this a shot, thank you!
There was a problem hiding this comment.
I'm gonna open a follow-up issue for this. I fell into a dependency nightmare again.
| import org.apache.iceberg.util.PropertyUtil; | ||
|
|
||
| /** Adaptor class to translate REST requests into {@link Catalog} API calls. */ | ||
| public class RESTCatalogAdapter implements RESTClient { |
There was a problem hiding this comment.
Ditto, if depending on iceberg-core's test artifacts is okay in the unit test scope, we can avoid this copy.
Which issue does this PR close?
N/A.
Rationale for this change
REST catalogs return metadata locations that don't physically exist on disk - the metadata is transferred via HTTP. This caused native Iceberg scans to fall back to Spark.
What changes are included in this PR?
Fixed reflection for protected methods:
IcebergReflection.getTableMetadata()andgetFormatVersion()to usegetDeclaredMethod()withsetAccessible(true)to access protected methods inBaseTableHandle missing metadata files:
FileIOinitialization in iceberg-rustFileScanTasksREST catalog test infrastructure:
RESTCatalogServletfrom IcebergwithRESTCatalogtest helperHow are these changes tested?
New test "REST catalog with native Iceberg scan" creates an embedded Jetty server with InMemoryCatalog backend, creates a table via REST API, and verifies native Iceberg scan is used with correct results.
All Iceberg Java tests pass locally with these changes.