Skip to content

Conversation

@ben-schwen
Copy link
Member

@ben-schwen ben-schwen commented Aug 31, 2022

Closes #5452

As also now document in the code, our rbindlist mapping follows a 3 pass approach:
1st pass - gather unique column names
2nd pass - count duplicates
3rd pass - create final column mapping

We actually do not change the underlying encodings here, but convert the read value for comparisons which seems safer since we do not want to alter the input tables.

@codecov
Copy link

codecov bot commented Aug 31, 2022

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 98.61%. Comparing base (014dafb) to head (45bc9ff).
Report is 1 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5453      +/-   ##
==========================================
- Coverage   98.61%   98.61%   -0.01%     
==========================================
  Files          79       79              
  Lines       14536    14534       -2     
==========================================
- Hits        14335    14333       -2     
  Misses        201      201              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@github-actions
Copy link

github-actions bot commented May 17, 2024

Comparison Plot

Generated via commit f85272b

Download link for the artifact containing the test results: ↓ atime-results.zip

Task Duration
R setup and installing dependencies 4 minutes and 54 seconds
Installing different package versions 7 minutes and 52 seconds
Running and plotting the test cases 2 minutes and 21 seconds

@MichaelChirico
Copy link
Member

Could you write out in the PR description what are the three call sites being updated? Is there any possibility of using a helper instead to avoid relying on updating 3 places to maintain similar changes in the future? Also please ensure there are tests for each of the 3 call sites (I guess codecov would be telling the truth here, but it's currently broken).

IIUC, UTF-8 was chosen as the "winning" encoding, it seems like even if none of the inputs are UTF-8. If that's right, I think it is fine, though it might be a breaking change, right? Please describe the behavior more explicitly in the NEWS item, and perhaps in ?rbindlist, and possibly write some new tests.

@MichaelChirico MichaelChirico added this to the 1.17.0 milestone Sep 4, 2024
@MichaelChirico
Copy link
Member

Saw my earlier comment. At least I'm consistent :)

@ben-schwen
Copy link
Member Author

Updated the description and also the code comments!

@MichaelChirico MichaelChirico merged commit f05893e into master Dec 3, 2024
6 of 7 checks passed
@MichaelChirico MichaelChirico deleted the rbindlist_encodings branch December 3, 2024 06:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

rbind cannot handle different name encodings

2 participants