-
Notifications
You must be signed in to change notification settings - Fork 4k
Refactor various aspects of arrow handling in frontend #10150
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| const allRows = range(rows) | ||
| const { cssId, cssStyles, caption } = table.styler ?? {} | ||
| const { headerRows, rows: numRows, columns: numColumns } = table.dimensions | ||
| const allRows = range(numRows) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: numRows sounds like it is an integer, whereas allRows sounds like it is an array of all actual rows. I assume it rather is allRowIndices or something like that. If that is true, I would rename the variable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, changed it 👍
| quiverData: Quiver | null | ||
| ): { [field: string]: any }[] | null { | ||
| if (!quiverData || quiverData.data.numRows === 0) { | ||
| if (!quiverData || quiverData.dimensions.dataRows === 0) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is quiverData.dimensions.dataRows a number / count? If yes, it might make sense to rename it to quiverData.dimensions.dataRowCount or similar
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I wanted to rename it a couple of times... e.g. prefix all variables with num ar suffix with Count. But haven't done so yet since its already under the dimensions object. But it probably would make code more readable if it has an additional indication that its just a number.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the namings
| const { dataRows: numRows, dataColumns: numColumns } = quiverData.dimensions | ||
|
|
||
| // This currently only works with a single index column. | ||
| // To support multiple index columns would require some |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super-nit: // To support ... -> // Supporting ...
frontend/lib/src/components/elements/ArrowVegaLiteChart/arrowUtils.ts
Outdated
Show resolved
Hide resolved
frontend/lib/src/components/widgets/DataFrame/columns/NumberColumn.ts
Outdated
Show resolved
Hide resolved
|
|
||
| // If not custom format is provided & the column type is duration or period, | ||
| // instruct the column to use the arrow formatting for the display value. | ||
| const useArrowFormatting = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it looks like this variable is first used in line 195. I would move it directly before that within the try block
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can see similar aspects a lot in our dataframe columns. This is for optimization. If I moved this down, it would be executed on every cell rendering -> and this is a high-risk area for performance impact. There could be a lot of cell renderings, e.g. if you scroll through a dataframe. All computations done on top here are only done once at column initialization -> so I generally try to do as many pre computations as possible that can happen on a column level on top.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahhh the code below is in a callback?! I did not realize, thanks for pointing that out. Makes a lot of sense 👍
| t1: PandasColumnType[], | ||
| t2: PandasColumnType[] | ||
| ): boolean { | ||
| // NOTE: We remove extra columns from the DataFrame that we add rows from. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where do we remove this? This here is a standalone function, so the comment probably needs some more context or rephrasing. I would probably write it more generically: we consider Dataframes to have the same types if all columns that exist in t1 also exist in t2 in the same order and with the same type.
I would also extend the JSDoc above to state that t2 can be larger than t1 so that it is directly obvious from the doc string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method was just moved out of arrowTypeUtils and it's untouched from me in its original form. While the overall addRows functionality would also require some refactoring, I do not plan to do it in the context of this project since the add_rows feature usage is almost non-existent (0.036%). It's in a ball range that we could consider deprecating/removing this feature all altogether.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cleaned up the comments based on your suggestion
| ) | ||
| } | ||
|
|
||
| /** True if both arrays contain the same index types in the same order. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In contrast to sameDataTypes, I would state in the JSDoc that If the arrays have different lengths, they are never the same
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added 👍
| DatetimeIndex = "datetime", | ||
| Float64Index = "float64", | ||
| Int64Index = "int64", | ||
| RangeIndex = "range", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is the only element left in the enum, maybe make it a constant? Or do we expect more elements to be added again?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, only range index needs special treatment. Changed it to constant.
## Describe your changes This PR contains a mix of smaller refactorings across all components that use arrow. It doesn't apply any behavioral or visual changes. ## Testing Plan - Doesn't apply any behavioral or visual changes - Added tests for new functions. --- **Contribution License Agreement** By submitting this pull request you agree that all contributions to this project are made under the Apache 2.0 license. --------- Co-authored-by: Lukas Masuch <[email protected]>
Describe your changes
This PR contains a mix of smaller refactorings across all components that use arrow. It doesn't apply any behavioral or visual changes.
Testing Plan
Contribution License Agreement
By submitting this pull request you agree that all contributions to this project are made under the Apache 2.0 license.