Skip to content

Conversation

@lukasmasuch
Copy link
Collaborator

@lukasmasuch lukasmasuch commented Jan 9, 2025

Describe your changes

This PR contains a mix of smaller refactorings across all components that use arrow. It doesn't apply any behavioral or visual changes.

Testing Plan

  • Doesn't apply any behavioral or visual changes
  • Added tests for new functions.

Contribution License Agreement

By submitting this pull request you agree that all contributions to this project are made under the Apache 2.0 license.

@lukasmasuch lukasmasuch changed the title [WIP] Clean up arrow styler usage [WIP] Refactor arrow handling - misc Jan 9, 2025
@lukasmasuch lukasmasuch changed the title [WIP] Refactor arrow handling - misc [WIP] Cut down exposed methods in arrow handling API Jan 9, 2025
@lukasmasuch lukasmasuch added security-assessment-completed Security assessment has been completed for PR impact:internal PR changes only affect internal code change:refactor PR contains code refactoring without behavior change labels Jan 9, 2025
@lukasmasuch lukasmasuch changed the title [WIP] Cut down exposed methods in arrow handling API Refactor various aspects of arrow handling in frontend Jan 10, 2025
@lukasmasuch lukasmasuch marked this pull request as ready for review January 10, 2025 01:21
const allRows = range(rows)
const { cssId, cssStyles, caption } = table.styler ?? {}
const { headerRows, rows: numRows, columns: numColumns } = table.dimensions
const allRows = range(numRows)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: numRows sounds like it is an integer, whereas allRows sounds like it is an array of all actual rows. I assume it rather is allRowIndices or something like that. If that is true, I would rename the variable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, changed it 👍

quiverData: Quiver | null
): { [field: string]: any }[] | null {
if (!quiverData || quiverData.data.numRows === 0) {
if (!quiverData || quiverData.dimensions.dataRows === 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is quiverData.dimensions.dataRows a number / count? If yes, it might make sense to rename it to quiverData.dimensions.dataRowCount or similar

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I wanted to rename it a couple of times... e.g. prefix all variables with num ar suffix with Count. But haven't done so yet since its already under the dimensions object. But it probably would make code more readable if it has an additional indication that its just a number.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the namings

const { dataRows: numRows, dataColumns: numColumns } = quiverData.dimensions

// This currently only works with a single index column.
// To support multiple index columns would require some
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super-nit: // To support ... -> // Supporting ...


// If not custom format is provided & the column type is duration or period,
// instruct the column to use the arrow formatting for the display value.
const useArrowFormatting =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it looks like this variable is first used in line 195. I would move it directly before that within the try block

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can see similar aspects a lot in our dataframe columns. This is for optimization. If I moved this down, it would be executed on every cell rendering -> and this is a high-risk area for performance impact. There could be a lot of cell renderings, e.g. if you scroll through a dataframe. All computations done on top here are only done once at column initialization -> so I generally try to do as many pre computations as possible that can happen on a column level on top.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ahhh the code below is in a callback?! I did not realize, thanks for pointing that out. Makes a lot of sense 👍

t1: PandasColumnType[],
t2: PandasColumnType[]
): boolean {
// NOTE: We remove extra columns from the DataFrame that we add rows from.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

where do we remove this? This here is a standalone function, so the comment probably needs some more context or rephrasing. I would probably write it more generically: we consider Dataframes to have the same types if all columns that exist in t1 also exist in t2 in the same order and with the same type.

I would also extend the JSDoc above to state that t2 can be larger than t1 so that it is directly obvious from the doc string.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method was just moved out of arrowTypeUtils and it's untouched from me in its original form. While the overall addRows functionality would also require some refactoring, I do not plan to do it in the context of this project since the add_rows feature usage is almost non-existent (0.036%). It's in a ball range that we could consider deprecating/removing this feature all altogether.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cleaned up the comments based on your suggestion

)
}

/** True if both arrays contain the same index types in the same order. */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In contrast to sameDataTypes, I would state in the JSDoc that If the arrays have different lengths, they are never the same

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added 👍

DatetimeIndex = "datetime",
Float64Index = "float64",
Int64Index = "int64",
RangeIndex = "range",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is the only element left in the enum, maybe make it a constant? Or do we expect more elements to be added again?

Copy link
Collaborator Author

@lukasmasuch lukasmasuch Jan 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, only range index needs special treatment. Changed it to constant.

@lukasmasuch lukasmasuch enabled auto-merge (squash) January 10, 2025 14:57
@lukasmasuch lukasmasuch merged commit f0dacf7 into develop Jan 10, 2025
33 checks passed
@lukasmasuch lukasmasuch deleted the refactor/arrow-styler-usage branch January 10, 2025 16:01
edegp pushed a commit to edegp/streamlit that referenced this pull request Jan 19, 2025
## Describe your changes

This PR contains a mix of smaller refactorings across all components
that use arrow. It doesn't apply any behavioral or visual changes.

## Testing Plan

- Doesn't apply any behavioral or visual changes
- Added tests for new functions. 

---

**Contribution License Agreement**

By submitting this pull request you agree that all contributions to this
project are made under the Apache 2.0 license.

---------

Co-authored-by: Lukas Masuch <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

change:refactor PR contains code refactoring without behavior change impact:internal PR changes only affect internal code security-assessment-completed Security assessment has been completed for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants