Skip to content

Conversation

@StefanieSenger
Copy link
Member

What does this implement/fix? Explain your changes.

This PR aims to simplify some things about the development side of metadata routing: how routing and consuming methods are mapped together.

It does not change any functionality, but rather helps to make the code more readable.

  1. MethodMapping.from_str() and it's usage are removed from the entire codebase. The alternative functionality, MethodMapping.add() that was existing next to it already before, is now used instead. This ensures consistency and clarity.

  2. caller and callee in MethodPair() are consistently sorted so that caller comes first. These two have been kwarguments before, but putting caller always first supports code readability.

These should have been two separate PRs really, I recognise. Merging these two things together happened by accident.

@github-actions
Copy link

github-actions bot commented Feb 14, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 35ecc4c. Link to the linter CI: here

@adrinjalali
Copy link
Member

Looking at the diff, I'm not sure if this is simplifying things. The mapping="score" was quite concise and neat, and now there's no way to do that w/o importing another class.

The CI fails and the prints are still there though.

@StefanieSenger
Copy link
Member Author

StefanieSenger commented Feb 15, 2024

I have now removed the print statements (oh my) and added the set_config(enable_metadata_routing=False) at the end of the plot_metadata_routing.py file, because that issue popped up here again. I think this should calm the doctest failures in the CI mostly. There might be a remaining one in grid_search.rst, that has this diff:

     GridSearchCV(cv=5,
    -             estimator=CalibratedClassifierCV(...),
    +             estimator=CalibratedClassifierCV(estimator=RandomForestClassifier(n_estimators=10)),

and that I am not sure why the shorter output was expected and how to handle that yet.

Edit: grid_search.rst did not raise again.

Concerning removing from_str as a whole: yes, mapping="score" is not possible anymore. People need to use MethodMapping in these cases now, too.

I think this is more explicit and easier to understand, since there are not several parallel ways of mapping methods together for the routing, but only one way. It's rather seldom that mapping="score" etc. were used before and applies mainly to testing with mocking classes. Also, meta-estimators that have routing implemented, do import MethodMapping for other parts of the code already.

I believe that having only this one way to do the routing, that is readable even if you don't look into what from_str does, is the simpler way.

@adrinjalali
Copy link
Member

I think this is more explicit and easier to understand, since there are not several parallel ways of mapping methods together for the routing, but only one way. It's rather seldom that mapping="score" etc. were used before and applies mainly to testing with mocking classes. Also, meta-estimators that have routing implemented, do import MethodMapping for other parts of the code already.

I think this is similar to the way people can pass a string as a scorer, or a make_scorer object result. That's normal to me, but I'd need to see what others feel about it.

I think even allowing to pass a list instead of just a string would be okay, and the list results in a one to one mapping between caller and callee.

WDYT @thomasjpfan @glemaitre @OmarManzoor

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think even allowing to pass a list instead of just a string would be okay, and the list results in a one to one mapping between caller and callee.

Are you think of this API for a list of strings?

.add(method_mapping=["fit", "predict", "score"])

It's not obvious to me that the mapping is "one-to-one" from fit to fit.

Maybe:

.add(method_mapping=MethodMapping().add_one_to_one(["fit", "predict", "score"]))

Looking at the method_mapping="one-to-one" API with fresher eyes, I agree that "one-to-one" is a bit implicit. It's hard to tell from the code itself which methods are being routed.

@StefanieSenger
Copy link
Member Author

StefanieSenger commented Feb 18, 2024

Thanks for your insight, @thomasjpfan. 🔸
How do you feel about removing or keeping the method_mapping="score" syntax? @adrinjalali and I both agree on removing method_mapping="one-to-one" , but not on the former question.

@thomasjpfan
Copy link
Member

How do you feel about removing or keeping the method_mapping="score" syntax?

I think it's simple enough to infer "one-to-one" when there is only one method. But then there is a significant complexity jump for adding a second mapping. Concretely:

.add(method_mapping="score")

# Adding the second mapping requires learning another object and syntax:
.add(method_mapping=method_mapping=MethodMapping()
            .add(caller="score", callee="score")
            .add(caller="predict", callee="predict"))

For all one-to-one mappings, I am considering:

# Single string for a single mapping:
.add(method_mapping=MethodMapping().add_one_to_one("score"))

# Add a second mapping by using a list:
.add(method_mapping=MethodMapping().add_one_to_one(["score", "predict"]))

@adrinjalali
Copy link
Member

I agree we can remove one-to-one from MetadataRouter.add, and have a MethodMapping().add_one_to_one or MethodMapping().one_to_one method.

@StefanieSenger
Copy link
Member Author

I'm a bit frustrated with how this discussion goes. It's a PR, not an issue. There is a concrete proposal to discuss and new ideas need to be compared with the proposal.

The goal of my proposal is to simplify the implementation of metadata routing by dropping two of three options for building the mapping for the routing. I had your verbal agreement for removing one-to-one before, @adrinjalali. Now I propose to also remove the other option from the from_str classmethod (passing the name of the one-to-one mapping methods as a string, like method_mapping="score").

This is hardly used (8 times throughout the whole repo) and mainly in test files.
See the number of occurrences in the diff:

sklearn/metrics/_scorer.py, l. 186
sklearn/metrics/tests/test_score_objects.py, l. 1249,1260
sklearn/tests/metadata_routing_common.py, l. 433
sklearn/tests/test_metadata_routing.py, l. 145, 652, 765, 799

In the diff you can also see, that I have substituted it with another, already existing method.

Inventing yet another method instead is not in accordance with the intention of this PR (which aims to simplify).

@adrinjalali
Copy link
Member

We can only have a final agreement on a concrete implementation @StefanieSenger . It's normal that we see the implementation and we realize it doesn't look as good as we thought.

In this particular case, I thought removing one-to-one would be nice, but then we started removing other cases like method_mapping="score". This leads to an alternative proposal to add an explicit method for when all mappings are one-to-one, but only for a subset of the methods.

If you feel frustrated, it's okay to move to another issue / PR and let this rest a bit and you can come back to it with fresh eyes when you feel like it. Or close it and leave it for a later date. But the discussion here is a very normal and expected part of the development cycle.

@StefanieSenger
Copy link
Member Author

I'm sorry @adrinjalali and @thomasjpfan.

Now I know that you were thinking and discussing about facilitating third party development rather than trying to keep MethodMapping.from_string() in our test files because it might entail an additional import. I didn't think about third parties at all and couldn't read this out of the discussion.

I now don't know what I could do or provide to move this forward. Let's see which direction the discussion on #28467 will take.

@StefanieSenger
Copy link
Member Author

I'd need some guidance in translating the consensus from issue #28467 into this PR.

Should I incorporate the creation of the new method here and replace from_str with it? Or rather entirely revert what I tried with from_str here?

Or keep the status quo, which is from_str is removed and not replaced by anything, in which case I would not do anything else here and would wait for reviews.

@thomasjpfan
Copy link
Member

I think we can keep this PR as is (removing from_str). Adding add_one_to_one can be a follow up PR.

@glemaitre glemaitre self-requested a review March 4, 2024 21:18
@glemaitre
Copy link
Member

I merged RidgeCV and RidgeClassifierCV. We should merge main in the branch and check that we don't need any change.

Otherwise the PR LGTM.

@adrinjalali
Copy link
Member

Got a conflict here. Could you fix please @StefanieSenger ? We can merge then.

@adrinjalali
Copy link
Member

more conflicts here @StefanieSenger

@adrinjalali adrinjalali merged commit 2bafd7b into scikit-learn:main May 2, 2024
@StefanieSenger StefanieSenger deleted the remove_one-to-one branch May 3, 2024 09:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants