-
Notifications
You must be signed in to change notification settings - Fork 3k
Description
Apache Iceberg version
1.6.0 (latest release)
Query engine
None
Please describe the bug 🐞
I've reproduced this with the following test added to TestSchemaUpdate.java:
@Test
public void testAddFieldWithDotInName() {
Schema expected =
new Schema(
required(1, "id", Types.IntegerType.get()),
optional(2, "dot", Types.StructType.of(
required(3, "field", Types.IntegerType.get()))),
optional(4, "dot.field", Types.IntegerType.get()));
Schema schema = new Schema(
required(1, "id", Types.IntegerType.get()),
optional(2, "dot", Types.StructType.of(
required(3, "field", Types.IntegerType.get()))));
Schema result =
new SchemaUpdate(schema, schema.highestFieldId())
.addColumn(null,"dot.field", Types.IntegerType.get())
.apply();
assertThat(result.asStruct()).isEqualTo(expected.asStruct());
}Seems like there are some internal maps (e.g. Schema.nameToId) that use plain strings as keys. These strings get built through joining nested fields using dots. This makes field nested under dot and a non-nested field dot.field result in the same key, causing the above test to fail.
I started looking at what would be needed to resolve this. Ideally the key isn't a string, but maybe a list of strings that represents different levels of nesting, or something similar. Otherwise, the dots should be escaped and unescaped when dealing with fields that actually have a dot in the name.
The main issue with this is that there are already quite a few methods that implicitly assume the field name doesn't have a dot inside. Some examples: deleteColumn, moveAfter and basically anything on UpdateSchema that doesn't accept a parent field (though even then, what if the parent field has a dot inside?). So more methods (or another interface with said methods) would need to be added in order to support the use case of a dot inside field names, in order to not break existing code.
Edit: I see this is actually a known limitation and s similar solution as suggested above was already discussed, but decided against for the sake of simplicity. IMO this limitation should still be documented in the spec instead of only in error messages.
Willingness to contribute
- I can contribute a fix for this bug independently
- I would be willing to contribute a fix for this bug with guidance from the Iceberg community
- I cannot contribute a fix for this bug at this time