Skip to content

Conversation

@trujillo9616
Copy link
Contributor

@trujillo9616 trujillo9616 commented Jan 7, 2022

Reference Issues/PRs

See also #9104. This PR enhances some of the proposed changes in the referenced issue.

What does this implement/fix? Explain your changes.

For _bagging.py:

  • BaseBagging:
    This enhancement renames the base_estimator parameter to estimator in the BaseBagging class for improved readability and consistency. An if function was added to check if the base_estimator was initialized by the user, a warning will be outputted in that case. Since the class BaseEnsemble, from which BaseBagging inherits, still uses the base_estimator parameter, an override for _validate_estimator and set_params was done just for assigning the estimator object to self.base_estimator.

  • BaggingClassifier and BagginRegressor:
    This enhancement renames the base_estimator parameter to estimator in both classes for improved readability and consistency.

For test_bagging.py:
Tests in the test_bagging.py. file were modified for using the updated parameter. Two new test cases was added to check if the warning has been raised when setting the base_estimator parameter in both BaggingClassifier and BaggingRegressor.

Copy link
Contributor

@cmarmo cmarmo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @trujillo9616 thanks for following up!
The tests are failing because other classes depend on BaseBagging and the change affects them.
My bad, I didn't take it into account.
Do you mind reverting the changes on BaseBagging and modifying accordingly the other two? Sorry for that.
My other comment should fix the sphinx warnings in the documentation.

Comment on lines +581 to +582
Use `estimator` instead.
.. deprecated:: 1.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Use `estimator` instead.
.. deprecated:: 1.1
Use `estimator` instead.
.. deprecated:: 1.1

The blank line will remove the sphinx warning... same comment in the Regressor

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review and the feedback @cmarmo ! I'll work on reverting the changes done to BaseBagging and further modifying both Bagging classes. I'll also apply your suggestion for fixing the sphinx warnings. 😁

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one should still be fixed... :)

Copy link
Contributor

@cmarmo cmarmo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One warning is still there.
Tests are not passing, this has something to do with the inheritance... but I'm stuck.
I will try to understand, but perhaps some core dev will give some insights here...

Comment on lines +581 to +582
Use `estimator` instead.
.. deprecated:: 1.1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one should still be fixed... :)

@trujillo9616
Copy link
Contributor Author

Thanks for the review @cmarmo ! I'm stuck on that failing test 🙁 Still working on that and will update the PR when I have a breakthrough. The empty space was the first thing I modified, I think I undid it when reverting the changes 😅 . I will make sure to modify it on the next update! Thanks again!!

Copy link
Contributor

@cmarmo cmarmo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I think I've found out. The common tests are failing because base_estimator changed its value in the initialization function: to avoid this the __init__ shold be

    def __init__(
        self,
        estimator=None,
        n_estimators=10,
        *,
        max_samples=1.0,
        max_features=1.0,
        bootstrap=True,
        bootstrap_features=False,
        oob_score=False,
        warm_start=False,
        n_jobs=None,
        random_state=None,
        verbose=0
    ):

        super().__init__(
            base_estimator=estimator,
            n_estimators=n_estimators,
            max_samples=max_samples,
            max_features=max_features,
            bootstrap=bootstrap,
            bootstrap_features=bootstrap_features,
            oob_score=oob_score,
            warm_start=warm_start,
            n_jobs=n_jobs,
            random_state=random_state,
            verbose=verbose
        )

The issue here is that you are deprecating a positional argument which keyword does not need to be written down. The base_estimator should automatically take the value of the estimator parameter. The FutureWarning will be issued only if one explicitly write

BaggingRegressor(base_estimator=SVC())

Hope I'm not driving you on the wrong path this time....


def _validate_estimator(self):
"""Check the estimator and set the base_estimator_ attribute."""
self.base_estimator = self.estimator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.base_estimator = self.estimator

This line is no longer necessary


def _validate_estimator(self):
"""Check the estimator and set the base_estimator_ attribute."""
self.base_estimator = self.estimator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
self.base_estimator = self.estimator

This line is no longer necessary

@trujillo9616
Copy link
Contributor Author

@cmarmo You're right I am having an Assertion Error because the base_estimator parameter is being mutated on init. Thanks so much for the hint!! 🙌

When running your proposed solution I am getting the following AttributeError: 'BaggingClassifier' object has no attribute 'estimator'. I think that for each parameter declared in the init function, there must be an attribute, so I tried initializing a self.estimator = estimator attribute (even though it will not be used). Running the tests its all green 😁 , however I think we would be breaking with the deprecation cycle, as no warning would be raised when entering the base_estimator parameter, instead an Error would be thrown because both BaggingClassifier and BaggingRegressor would not have base_estimator as a valid input parameter.

    def __init__(
        self,
        estimator=None,
        n_estimators=10,
        *,
        max_samples=1.0,
        max_features=1.0,
        bootstrap=True,
        bootstrap_features=False,
        oob_score=False,
        warm_start=False,
        n_jobs=None,
        random_state=None,
        verbose=0,
        base_estimator="deprecated"
    ):

        super().__init__(
            base_estimator=estimator,
            n_estimators=n_estimators,
            max_samples=max_samples,
            max_features=max_features,
            bootstrap=bootstrap,
            bootstrap_features=bootstrap_features,
            oob_score=oob_score,
            warm_start=warm_start,
            n_jobs=n_jobs,
            random_state=random_state,
            verbose=verbose
        )

       self.estimator = estimator
       self.deprecated = base_estimator

So I think we still need the base_estimator="deprecated" in the init method. I also think that an additional attribute for storing the base_estimator parameter would be needed, in this example I used self.deprecated. But I'm struggling to come up with a solution for making the attribute switch in the fit method, I'm getting the same AssertionError because the base_estimator attribute is being mutated during the switch 😓 .

What do you think? Should I ignore the deprecation cycle or keep trying to find a solution?

@cmarmo
Copy link
Contributor

cmarmo commented Jan 17, 2022

When running your proposed solution I am getting the following AttributeError: 'BaggingClassifier' object has no attribute 'estimator'. I think that for each parameter declared in the init function, there must be an attribute, so I tried initializing a self.estimator = estimator attribute (even though it will not be used). Running the tests its all green

Sorry if I didn't make myself clear: yes the self.estimator = estimator line should be kept (my bad as it was missing in my copy and paste)
If all is green, your tests about deprecation are green too, so the deprecation cycle is applied.
Do you mind pushing this 'all-check-are-green' version, fixing the sphinx warning too and we will moving forward from there? Thank you!

@cmarmo
Copy link
Contributor

cmarmo commented Feb 18, 2022

Hello @trujillo9616 , if you are still interested in working on this, do you mind fixing conflicts and commit the version passing the tests, as we discussed earlier? It is very difficult to suggest modification without updated code. Thank you for your patience.

@EdAbati
Copy link
Contributor

EdAbati commented Jun 28, 2022

Hi @cmarmo , I saw that this is 'Stalled'. I could help with this one if it is ok :)

@cmarmo
Copy link
Contributor

cmarmo commented Jul 1, 2022

Please @EdAbati go on! :)
Thanks for your help!

@cmarmo
Copy link
Contributor

cmarmo commented Jul 7, 2022

I am closing this one as superseded by #23819.
Thanks @trujillo9616 for your work, your commits have been integrated in #23819.

@cmarmo cmarmo closed this Jul 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

module:ensemble Superseded PR has been replace by a newer PR

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants