Skip to content

Write binary files directly, bypass CSV intermediary#1876

Merged
sstruzik merged 6 commits intomainfrom
feature/bypass-csv-intermediary
Feb 27, 2026
Merged

Write binary files directly, bypass CSV intermediary#1876
sstruzik merged 6 commits intomainfrom
feature/bypass-csv-intermediary

Conversation

@sstruzik
Copy link
Copy Markdown
Contributor

Summary

  • Eliminate the CSV intermediary step for GUL, IL/FM, and RI input files. Binary files are now written directly from DataFrames during preparation, with CSV output only when intermediary_csv=True (for debugging).
  • Refactor csvtobin converters to decouple CSV reading from binary writing, extracting reusable functions (df_to_ndarray, amplifications_write_bin, complex_items_write_bin).
  • Replace csv_to_bin runtime conversion with move_bin() which relocates pre-built .bin files to the run output directory.
  • Update IL/RI detection to check for .bin files (not just .csv), and update _check_each_inputs_directory to accept either format.
  • Use np.memmap(mode='r') instead of np.fromfile for demand-paged binary reads.

Eliminate the CSV intermediary step for GUL, IL/FM, and RI input files.
Previously the pipeline wrote DataFrames to CSV, then converted CSV to
binary at runtime. Now binary files are written directly from DataFrames
during preparation, with CSV output only when intermediary_csv=True
(for debugging).

Preparation changes:
- Refactor csvtobin converters to decouple CSV reading from binary
  writing: extract df_to_ndarray(), amplifications_write_bin(), and
  complex_items_write_bin() as reusable functions.
- write_gul_input_files(): replace per-column pop/prepare dicts with a
  unified files_write_info dict that drives binary+CSV output.
- get_il_input_items(): write FM binary files directly, rename pandas
  dtype dicts to avoid shadowing numpy dtypes, return .bin paths.
- write_files_for_reinsurance(): write RI binary files directly using
  df_to_ndarray().tofile().
- Thread intermediary_csv parameter through GenerateFiles, RunExposure,
  and all preparation functions.

Execution changes:
- Replace csv_to_bin/\_csv_to_bin with move_bin() which relocates
  pre-built .bin files to the run output directory.
- Update IL/RI detection in GenerateLossesDir and
  GenerateLossesDeterministic to check for .bin files (not just .csv).
- Update _check_each_inputs_directory to accept either .csv or .bin.
- Remove unused step_flag from deterministic loss commands.
- Use np.memmap(mode='r') instead of np.fromfile in load_as_ndarray
  and load_as_array for demand-paged binary reads.

Test updates:
- Remove CsvToBin test class and associated test helpers.
- Update test_generate_files expected paths from .csv to .bin.
@sstruzik sstruzik requested a review from benhayes21 February 16, 2026 17:37
@sstruzik sstruzik requested a review from SkylordA February 18, 2026 08:32
@sstruzik sstruzik moved this to Waiting for Review in Oasis Dev Team Tasks Feb 18, 2026
@sstruzik
Copy link
Copy Markdown
Contributor Author

the Piwind test are failing because it checks for the presence of csv files in the created run directory.
I've switch intermediary csv to True in for the test so all file should be produced and can be checked.
tests pass => https://github.com/OasisLMF/OasisLMF/actions/runs/22094933438

@SkylordA SkylordA self-requested a review February 25, 2026 11:54
@sambles sambles mentioned this pull request Feb 26, 2026
@sstruzik sstruzik merged commit 43f200f into main Feb 27, 2026
28 of 32 checks passed
@github-project-automation github-project-automation bot moved this from Waiting for Review to Done in Oasis Dev Team Tasks Feb 27, 2026
@awsbuild awsbuild added this to the 2.5.1 milestone Feb 27, 2026
@sambles sambles deleted the feature/bypass-csv-intermediary branch March 25, 2026 15:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants