[WIP] Example of multiple imputation with IterativeImputer#13025
[WIP] Example of multiple imputation with IterativeImputer#13025sergeyf wants to merge 59 commits intoscikit-learn:mainfrom
Conversation
|
Paging @jnothman and @RianneSchouten. |
|
It might be good to amend that first commit with |
ecfdfc5 to
4f59d37
Compare
4f59d37 to
999bfa0
Compare
|
OK, I think that worked. |
|
I suspect that without |
|
I've created examples/impute/README.txt in the iterativeimputer branch. |
|
Oh, no. Did I break the doctest again? |
|
@jnothman I wondered back to this PR, and now it passes tests! Any interest in picking up work on this? I feel like it was in a pretty good place already and we were just unsure about the extremely long runtimes. |
|
@jnothman @glemaitre Any thoughts on my last comment? Repeated here: "Any interest in picking up work on this? I feel like it was in a pretty good place already and we were just unsure about the extremely long runtimes." |
|
I found the current docs ambiguous and believe the community would value this work. |
|
I'm not familiar with these build trigger checks. Can anyone please suggest how to fix it? |
Syncing with the |
|
Thank you @thomasjpfan! Any idea if we can get this merged once all the tests pass? |
This PR still need two approvals to get merged and I do not have a time estimate for that to happen. At a glance, I see two big tasks for this PR:
|
|
Thanks! I can make those changes. What regression dataset do you recommend to replace Boston? Smaller is best because this example is hefty, but I can always subsample any dataset. |
I would like to see how this example integrates within the proposal in #21967. |
|
@glemaitre MICE is in the family of multiple imputation - perform imputation multiple times, then apply your subsequent pipeline multiple times also, and then have multiple solutions. For To summarize:
|
|
I agree that having an example defining what is "multiple imputations" is important to remove the confusion with the iterative procedure of the In this regard, I would prefer to have a single pipeline to make single imputation and then create a specific estimator to show how to make multiple imputations. We would not even need to use an I think that this is super important to point out in the discussion that the example stands at providing a definition of "multiple imputations with code" rather than showing that multiple imputations work better. I am not sure that currently in ML setting there is any evidence that multiple imputations are working better than using a strong learner (@GaelVaroquaux and @marineLM have better insights than me on this). |
|
Cc @A-pl (we need to put your paper on HAL) |
|
I'm a bit confused. We do have a single pipeline in example 2: https://github.com/sergeyf/scikit-learn/blob/iterativeimputer_mice_example/examples/impute/plot_multiple_imputation.py#L303 And it's used multiple times to do MICE: https://github.com/sergeyf/scikit-learn/blob/iterativeimputer_mice_example/examples/impute/plot_multiple_imputation.py#L315 Can you please clarify what you'd like changed? |
Adding to #11977. This PR is a restart of #11370, which got messy.
Here is a quote from #11370 that explains what this PR does: