feat: Add basic operations for UpdateSchema#1172
feat: Add basic operations for UpdateSchema#1172jonathanc-n wants to merge 14 commits intoapache:mainfrom
UpdateSchema#1172Conversation
|
@Fokko @Xuanwo @liurenjie1024 This should be ready for review |
CTTY
left a comment
There was a problem hiding this comment.
Hi @jonathanc-n , thanks for the work! I've left some comments. Also there are some implementations missing:
- functions like
moveandunion_schema applylogic to commit changes to the schema
Do you plan to address them in this PR?
| use std::fmt::{Display, Formatter}; | ||
| use std::sync::Arc; | ||
|
|
||
| mod update; |
There was a problem hiding this comment.
we should probably name it update_schema.rs avoid confusion
There was a problem hiding this comment.
I think it is best practice to infer this from the folder name. For example codebases such as Datafusion or iceberg-rust, files named metadata are just called metadata.rs under different folders (ex. manifest, puffin, etc.) Unless it becomes ambiguous with other names in the same folder
There was a problem hiding this comment.
I'm very new to rust naming convention, thanks for the context! In this case, probably schema.rs is a better name?
I'm adding a new file update_statistics.rs under the same folder in this PR: #1359 I'll probably rename it to statistics.rs, wdyt?
There was a problem hiding this comment.
No the folder gives context, so schema/update.rs -> updating schema. with transaction/update_statistics, there is no context given to what the file is doing if it is called statistics.rs, so update_statistics is fine.
There was a problem hiding this comment.
Understood, thanks for the explanation!
| /// This method returns a reference to `Self` to allow for method chaining. | ||
| fn add_column( | ||
| &mut self, | ||
| column_name: Vec<String>, |
There was a problem hiding this comment.
is there any considerations that we do not want to take the column name string and then find the parent via schema like iceberg-java?
I did notice that iceberg-python followed this pattern, and would love to understand the context
There was a problem hiding this comment.
I believe to avoid having dot in names being a problem, not sure if there is another reason though.
| column_name: Vec<String>, | ||
| field_type: Type, | ||
| doc: Option<String>, | ||
| required: bool, |
There was a problem hiding this comment.
There is a recent PR to support default values in UpdateSchema, it would be good to port that to iceberg-rs as well: apache/iceberg@602c35a
There was a problem hiding this comment.
I think this would also be nice in a follow up pull request. I created an issue for that here.
| /// # Returns | ||
| /// | ||
| /// An empty Ok(()) on success. | ||
| pub fn set_column_requirement( |
There was a problem hiding this comment.
I think it would be better to make this private and add APIs like make_column_optional to call it
There was a problem hiding this comment.
I think just changing the function name would be fine? what do you think?
There was a problem hiding this comment.
I'm thinking of something like
pub fn require_column(col_name) { set_column_requirement(col_name, true) }
pub fn make_column_optional(col_name) {set_column_requirement(col_name, false)}
fn set_column_requirement { // this function }
|
|
||
| #[allow(dead_code)] | ||
| #[derive(Debug)] | ||
| pub struct Move { |
There was a problem hiding this comment.
it seems like move related functions are not implemented yet?
There was a problem hiding this comment.
Yes this will be implemented in a follow up pull request.
|
Are unit tests being tracked separately? |
| if !self.deletes.contains(&existing_field.id) { | ||
| return Err(Error::new( | ||
| crate::ErrorKind::DataInvalid, | ||
| format!("Cannot add column {}, to non-struct type.", name), |
| _ => parent_field.clone(), | ||
| }; | ||
|
|
||
| if !parent_type.is_struct() { |
There was a problem hiding this comment.
Are there any other invariants to check about parent? It shouldn't be a part of self.deletes for example.
| } | ||
| } | ||
|
|
||
| if parent.is_empty() { |
There was a problem hiding this comment.
I think I'm probably misunderstanding something, don't we want to search for parent if its not empty?
|
hey @jonathanc-n -- very interested to see this get accepted and merged. anything i can do to help get this over the line? |
|
This is a hero PR I want deadly, any chance we could revive it 😭 |
|
Sorry I missed this, I don't think I'll have time to relook at the changed transaction architecture. Someone can possibly use this code and see if they can do something with it |
|
We need to update the pr for latest tx api. |
Signed-off-by: StandingMan <[email protected]>
Which issue does this PR close?
SchemaUpdatelogic to Iceberg-Rust #697What changes are included in this PR?
Added basic functionality to UpdateSchema. Wanted to split it up in two parts.
Are these changes tested?