fix: field id in name mapping should be optional#1426
Conversation
|
@barronw looks like theres a linter issue, could you try to run |
kevinjqliu
left a comment
There was a problem hiding this comment.
Generally LGTM, added a few comments.
I verified that this is according to the spec
https://iceberg.apache.org/spec/#column-projection
names: A required list of 0 or more names for a field.
field-id: An optional Iceberg field ID used when a field's name is present in names
fields: An optional list of field mappings for child field of structs, maps, and lists.
| "names": [] | ||
| } | ||
| """ | ||
| assert MappedField(field_id=None, names=[]) == MappedField.model_validate_json(mapped_field) |
There was a problem hiding this comment.
nit: also test omitting the field_id=None
| def field(self, field: NestedField, field_partner: Optional[MappedField], field_result: IcebergType) -> IcebergType: | ||
| if field_partner is None: | ||
| raise ValueError(f"Field missing from NameMapping: {'.'.join(self.current_path)}") | ||
| if field_partner is None or field_partner.field_id is None: |
There was a problem hiding this comment.
I'm curious about this change, why do we need to check for field_partner.field_id
There was a problem hiding this comment.
This should just be a type check since NestedField expects a field ID below. The field partner is looked up by field ID before being passed into this method.
There was a problem hiding this comment.
I see field_id is a required field in NestedField. Without field_partner.field_id is None, type checker errors
45f18aa to
afc36f8
Compare
| update.name: update.field_id for f in field_results if (update := self._updates.get(f.field_id)) | ||
| update.name: update.field_id | ||
| for f in field_results | ||
| if f.field_id is not None and (update := self._updates.get(f.field_id)) |
There was a problem hiding this comment.
what is the rationale behind this change? Should we look at all the other places where .field_id is used?
There was a problem hiding this comment.
The update_mapping API doesn't currently support changes to mappings without a field ID since updates is typed with Dict[int, NestedField].
There was a problem hiding this comment.
I see, is this change also due to the type checker?
There was a problem hiding this comment.
Yes, also changing the API of updating_mapping probably requires a larger discussion.
| def field(self, field: NestedField, field_partner: Optional[MappedField], field_result: IcebergType) -> IcebergType: | ||
| if field_partner is None: | ||
| raise ValueError(f"Field missing from NameMapping: {'.'.join(self.current_path)}") | ||
| if field_partner is None or field_partner.field_id is None: |
There was a problem hiding this comment.
I see field_id is a required field in NestedField. Without field_partner.field_id is None, type checker errors
kevinjqliu
left a comment
There was a problem hiding this comment.
LGTM! Id like another set of eyes to look over the changes w.r.t NameMapping
pyiceberg/table/name_mapping.py
Outdated
| field_id = {"field-id": self.field_id} if self.field_id is not None else {} | ||
| fields = {"fields": self.fields} if len(self.fields) > 0 else {} | ||
| return { | ||
| "field-id": self.field_id, | ||
| **field_id, | ||
| "names": self.names, | ||
| **fields, | ||
| } |
There was a problem hiding this comment.
More a style thing, I think it is a bit awkward to merge all the dicts, how about:
serialized = {
"names": self.names
}
if self.field_id is not None:
serialized['field-id'] = self.field_id
if len(self.fields) > 0:
serialized['fields'] = fields
return serializedThere was a problem hiding this comment.
Updated, needed to also reorder fields in the tests.
afc36f8 to
1fd60fd
Compare
|
Thanks for fixing this @barronw and thanks for the review @kevinjqliu |
Closes #1420.