Skip to content

Fix performance regression in ColumnTransformer#29330

Merged
ogrisel merged 2 commits intoscikit-learn:mainfrom
jeremiedbb:rev-28822
Jun 28, 2024
Merged

Fix performance regression in ColumnTransformer#29330
ogrisel merged 2 commits intoscikit-learn:mainfrom
jeremiedbb:rev-28822

Conversation

@jeremiedbb
Copy link
Member

@jeremiedbb jeremiedbb commented Jun 21, 2024

Fixes #29229

#28822 was a quick fix for #28781 before releasing 1.5.0. It turns out that it caused a performance regression when the input can't be memory mapped, for instance columns where elements are arrays of variable length. In that case the whole X is copied in each subprocess.

Since #29018 has now been merged, we can revert #28822 because it also fixes #28781 without causing the performance regression (note that I kept the non regression test). The following results of the snippet from #29229 with this PR are close to the 1.4.2 ones:

sklearn version: 1.6.dev0 and n_jobs=1
5: Per col: 0.0206s / total 0.10 s
10: Per col: 0.0203s / total 0.20 s
15: Per col: 0.0204s / total 0.31 s
20: Per col: 0.0201s / total 0.40 s
25: Per col: 0.0214s / total 0.54 s
30: Per col: 0.0204s / total 0.61 s
35: Per col: 0.0202s / total 0.71 s
40: Per col: 0.0200s / total 0.80 s
45: Per col: 0.0208s / total 0.93 s

sklearn version: 1.6.dev0 and n_jobs=2
5: Per col: 0.1613s / total 0.81 s
10: Per col: 0.0352s / total 0.35 s
15: Per col: 0.0321s / total 0.48 s
20: Per col: 0.0334s / total 0.67 s
25: Per col: 0.0299s / total 0.75 s
30: Per col: 0.0296s / total 0.89 s
35: Per col: 0.0283s / total 0.99 s
40: Per col: 0.0294s / total 1.18 s
45: Per col: 0.0294s / total 1.32 s

@github-actions
Copy link

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 2bc3222. Link to the linter CI: here

Copy link
Member

@betatim betatim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me.

I don't think we can add a regression test for this. So it is Ok to not have one.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@ogrisel ogrisel merged commit 4bc61a0 into scikit-learn:main Jun 28, 2024
jeremiedbb added a commit to jeremiedbb/scikit-learn that referenced this pull request Jul 2, 2024
@jeremiedbb jeremiedbb mentioned this pull request Jul 2, 2024
11 tasks
jeremiedbb added a commit to jeremiedbb/scikit-learn that referenced this pull request Jul 2, 2024
snath-xoc pushed a commit to snath-xoc/scikit-learn that referenced this pull request Jul 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

3 participants