Use Pydantic's model_copy for model modification when updating table metadata#182
Conversation
# Conflicts: # pyiceberg/table/__init__.py # tests/conftest.py
# Conflicts: # tests/conftest.py
…able_metadata_update_model_copy # Conflicts: # pyiceberg/table/__init__.py
# Conflicts: # pyiceberg/table/__init__.py # tests/table/test_init.py
pyiceberg/table/__init__.py
Outdated
| new_metadata = _apply_table_update(update, new_metadata, context) | ||
|
|
||
| # Rebuild metadata to trigger validation | ||
| new_metadata = TableMetadataUtil.parse_obj(copy(new_metadata.model_dump())) |
There was a problem hiding this comment.
Since model_copy performs a shallow copy by default, I believe we need to execute a deep copy before returning the final new_metadata. Otherwise, base_metadata might be inadvertently altered due to any improper updates applied to new_metadata subsequently.
Furthermore, as indicated by pydantic/pydantic#418 and by some local tests, model_copy(update=) does not validate the contents of update. I think it might be good to reconstruct the metadata at this point to initiate the validation process.
(Alternatively, we could perform a deep copy here and incorporate the validation into our unit tests. Open to discussion on this approach.)
There was a problem hiding this comment.
I think we're okay to skip the validation on the Pydantic side, we should validate the input anyway to generate a more meaningful error.
We could set it to deep-copy using the deep=True kwarg. Doing this in Pydantic is probably an order of magnitude more efficient than doing this in Python.
There was a problem hiding this comment.
Thanks for the explanation! I changed to model_copy(deep=True) here and moved the validation to a unit test
Fokko
left a comment
There was a problem hiding this comment.
This is great @HonahX! Thanks for fixing this right away.
One thing I wondered if it is better to do a deep-copy when returning the final metadata, or just deep-copy in the steps where we update the metadata. But I think the current approach is more efficient when there are several updates.
Fixes #179
This PR uses Pydantic's
model_copyto apply table updates to metadata. Specifically:AddSchema,SetCurrentSchema,AddSnapshot,SetSnapshotRef, we usebase_metadata.model_copy(update=...)to get the updated metadata.UpgradeFormatVersion, we still need to rebuild the whole model fromTableMetadataV1toTableMetadataV2@Fokko Could you please take a look at this when you have a chance? Thanks!