Skip to content

Conversation

@lukasmasuch
Copy link
Collaborator

@lukasmasuch lukasmasuch commented Jan 7, 2025

Describe your changes

This PR simplifies the arrow handling logic by renaming variables and functions to more descriptive names. This also adds a variety of code comments to make it easier to understand the logic.

To simplify reviewing, this PR does not apply any logical changes.

Testing Plan

  • No logical or visual changes - no new tests required.

Contribution License Agreement

By submitting this pull request you agree that all contributions to this project are made under the Apache 2.0 license.

@lukasmasuch lukasmasuch added security-assessment-completed Security assessment has been completed for PR impact:internal PR changes only affect internal code change:refactor PR contains code refactoring without behavior change labels Jan 7, 2025
@lukasmasuch lukasmasuch changed the title [WIP] Simplify arrow utility Simplify arrow utility by renaming to more descriptive names Jan 9, 2025
@lukasmasuch lukasmasuch marked this pull request as ready for review January 9, 2025 15:10
* NOTE: ArrowJS automatically formats the columns in schema, i.e. we always get strings.
*/
export type Columns = string[][]
export type ColumnNames = string[][]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you are about to update the comments: maybe this comment could be updated with a quick info why this is a multidimensional array. I suppose it is for nested column names?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, to support multi-level column headers. Added it to the comment 👍


/** DataFrame index and data types. */
export interface Types {
/** DataFrame's index and data column types. */
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super-nit: /** A DataFrame's ...

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed 👍

/** DataFrame index and data types. */
export interface Types {
/** DataFrame's index and data column types. */
export interface ColumnTypes {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this interface seems to only support Pandas types, should it maybe be called PandasColumnTypes similar to the other renamings? And will this be generalized more in a follow-up step?

Copy link
Collaborator Author

@lukasmasuch lukasmasuch Jan 9, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed 👍

And will this be generalized more in a follow-up step?

Yep, there is still quite a bit of reliance on pandas metadata to remove to support purely raw arrow data. That will come with follow-up PRs. Goal is that the Pandas metadata is only used for additional accuracy, but the main information should come from the Arrow metadata. Unfortunately, the reliance on pandas' metadata was even more extreme than I had initially thought.

Copy link
Collaborator

@raethlein raethlein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for making the names clearer!

@lukasmasuch lukasmasuch enabled auto-merge (squash) January 9, 2025 16:03
@lukasmasuch lukasmasuch merged commit a5b5bc3 into develop Jan 9, 2025
33 checks passed
edegp pushed a commit to edegp/streamlit that referenced this pull request Jan 19, 2025
…it#10123)

## Describe your changes

This PR simplifies the arrow handling logic by renaming variables and
functions to more descriptive names. This also adds a variety of code
comments to make it easier to understand the logic.

To simplify reviewing, this PR does not apply any logical changes.

## Testing Plan

- No logical or visual changes - no new tests required.

---

**Contribution License Agreement**

By submitting this pull request you agree that all contributions to this
project are made under the Apache 2.0 license.
@lukasmasuch lukasmasuch deleted the refactor/clean-up-arrow-utility branch March 5, 2025 18:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

change:refactor PR contains code refactoring without behavior change impact:internal PR changes only affect internal code security-assessment-completed Security assessment has been completed for PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants