Skip to content

Optimize rake by replacing ipfn with vectorized IPF#135

Closed
neuralsorcerer wants to merge 4 commits intofacebookresearch:mainfrom
neuralsorcerer:ipfn
Closed

Optimize rake by replacing ipfn with vectorized IPF#135
neuralsorcerer wants to merge 4 commits intofacebookresearch:mainfrom
neuralsorcerer:ipfn

Conversation

@neuralsorcerer
Copy link
Copy Markdown
Collaborator

Added a vectorized _run_ipf_numpy helper that mirrors the original ipfn behaviour while avoiding the package dependency. Reworked the raking workflow to feed the new solver, rebuild the cell-weight mapping via DataFrame joins, and return the historical rake_weight series shape.

@meta-cla meta-cla bot added the cla signed label Nov 10, 2025
@meta-codesync
Copy link
Copy Markdown

meta-codesync bot commented Nov 10, 2025

@talgalili has imported this pull request. If you are a Meta employee, you can view this in D86672684.

@talgalili
Copy link
Copy Markdown
Contributor

talgalili commented Nov 10, 2025

Thanks @neuralsorcerer - this is super cool!

Since this is a 'big deal' change, could you please:

  1. Consider buffing up the tests for raking - just to make sure we're not missing any edge cases for this change? (if you think it's good as is, then feel free to keep it as is). Test file: https://github.com/facebookresearch/balance/blob/main/tests/test_rake.py
  2. Add code (and output) for doing a benchmark before and after the change - just so we'll get a sense of the gain? while I like skipping ipdf from the dependency, I want to make sure there are no edge cases that ipfn will solve now (or in the future), that we will miss.

WDYT?

@neuralsorcerer
Copy link
Copy Markdown
Collaborator Author

  1. Consider buffing up the tests for raking - just to make sure we're not missing any edge cases for this change? (if you think it's good as is, then feel free to keep it as is)

I was planning to add more tests incrementally in upcoming PRs. I am happy to add more tests in this PR if we are fine with it.

  1. Add code (and output) for doing a benchmark before and after the change - just so we'll get a sense of the gain? while I like skipping ipdf from the dependency, I want to make sure there are no edge cases that ipfn will solve now (or in the future), that we will miss.

That's a great suggestion. Will do it :)

@talgalili
Copy link
Copy Markdown
Contributor

talgalili commented Nov 10, 2025 via email

@facebook-github-bot
Copy link
Copy Markdown
Contributor

@neuralsorcerer has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Copy Markdown
Contributor

@neuralsorcerer has updated the pull request. You must reimport the pull request before landing.

@neuralsorcerer
Copy link
Copy Markdown
Collaborator Author

Added benchmark_ipfn.py file under benchmark folder for the benchmark code.

ipfn package our in-tree numpy solver
1.83ms 1.15ms

@talgalili
Copy link
Copy Markdown
Contributor

Great job, thanks @neuralsorcerer !

I'm working to get this land in the coming hour.

@talgalili
Copy link
Copy Markdown
Contributor

FYI @neuralsorcerer
Just wanted to let you know I'm still working on this diff and that it hasn't landed yet because there are a bunch of internal errors I'm sorting out on the files. I suspect it will land within 24 hours after I finish fixing them.

@neuralsorcerer
Copy link
Copy Markdown
Collaborator Author

No hurries and thank you for working on it @talgalili

@meta-codesync
Copy link
Copy Markdown

meta-codesync bot commented Nov 11, 2025

@talgalili merged this pull request in 21d657a.

@neuralsorcerer neuralsorcerer deleted the ipfn branch November 12, 2025 07:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants