Add external schema mappings for files written with name-based schemas #40#207
Add external schema mappings for files written with name-based schemas #40#207rdblue merged 15 commits intoapache:masterfrom
Conversation
core/src/test/java/org/apache/iceberg/avro/TestExtNameToFieldIdsMap.java
Outdated
Show resolved
Hide resolved
|
Thanks @rdblue for the comments. I wanted to get a sense from you for the overall approach here, specifically around the mapping , where the fieldname to id map contains non canonical field names and testing strategy. Do you think the current approach is sound enough? |
|
Hi @rdblue , please have a look when u get the time. I've addressed your remaining comments. I need your inputs specifically on some of the TODOs I've added to the code. |
core/src/test/java/org/apache/iceberg/avro/TestExtSchemaMapping.java
Outdated
Show resolved
Hide resolved
|
Sorry for not getting back to this very quickly. I'm planning to review it this week. |
|
@rdsr, thanks for your patience. This looks like a great start. I made a few specific comments. In addition, I think we may want to apply the schema mapping differently. Instead of building an Avro schema with IDs, why not add an optional mapping to the conversion from Avro to Iceberg schema? Then that conversion could drop fields that don't have IDs, either embedded in the schema or from a mapping. I think that would fit a bit more cleanly because it would not be creating an intermediate file schema. It would be producing the Iceberg equivalent of the Avro schema instead. What do you think? |
|
I think the passing an optional mapping to the conversion from Avro to Iceberg schema makes sense. Let me investigate more on this. Thanks for the comments! |
|
@rdsr, I've been working on #41, and I think we may have some restructuring to the mapping, instead of a simple name to ID map. Supporting aliases with a simple map requires enumerating all of the possibilities, and it is difficult to work with. I think a multi-level approach, where each struct has a mapping is going to be better to work with. I'll clean up what I have and post a PR for you to look at. |
|
Thanks @rdblue ! |
|
@rdblue I've updated the approach here. I now pass an optional The general logic now is
|
|
Thanks for the update, @rdsr! I'll have a look next week when I'm back. |
core/src/test/java/org/apache/iceberg/avro/TestAvroNameMapping.java
Outdated
Show resolved
Hide resolved
189dc0a to
fb69c79
Compare
…Mapping, * Change the read schema to rename a field uniquely if that field is not projected by NameMapping
|
I've updated the Requesting a review @rdblue ! |
core/src/test/java/org/apache/iceberg/avro/TestAvroNameMapping.java
Outdated
Show resolved
Hide resolved
core/src/test/java/org/apache/iceberg/avro/TestAvroNameMapping.java
Outdated
Show resolved
Hide resolved
|
Merging this. Nice work, @rdsr! Thanks for taking care of this. |
Here's a first pass at implementing this.
External mapping is maintained as a field name -> field id map. So each alias will be an different entry in the map with the same id.
AssignFieldIdsassign field ids to fields. This visitor updates the schema instead of building a new schema with field ids. I felt this was OK. Let me know if this is not the case.TODOs
In code, Open questions I've added as TODOs. I'd appreciate to if we can discuss those in the comments