Skip to content
This repository was archived by the owner on Feb 28, 2024. It is now read-only.

Conversation

@samuelduchesne
Copy link

The problem

The current behavior of the LabelEncoder is to sort the variables when the mapping is performed. This happens because of the use of np.unique which returns a sorted array of unique values: See https://github.com/Elementa-Engineering/scikit-optimize/blob/master/skopt/space/transformers.py#L175-L177 and https://numpy.org/doc/stable/reference/generated/numpy.unique.html

For example:

>>> from skopt.space.space import Categorical

>>> c = Categorical(("c", "b", "a"), transform="label")
>>> c.transform(["a", "b", "c"])
[0, 1, 2]

Note that the returned labels are 0, 1 and 2 (equivalent to ("a", "b", "c") even if the specified order was ("c", "b", "a")). This can be counter-intuitive, especially when the order of the variable "means" something for the user.

Implemented Fix

This PR, implements a simple fix, which retains the order of the categorical dimensions. The expected behavior then becomes:

>>> from skopt.space.space import Categorical

>>> c = Categorical(("c", "b", "a"), transform="label")
>>> c.transform(["a", "b", "c"])
[2, 1, 0]

The order is conserved.

Same goes for numerical numbers:

from skopt.space.space import Categorical

c = Categorical((10, 30, 20), transform="label")
c.transform([10, 20, 30])
[0, 2, 1]

@samuelduchesne samuelduchesne changed the title Keep order of variables in LabelEncoder [MRG] Keep order of variables in LabelEncoder Sep 14, 2021
@samuelduchesne
Copy link
Author

@kernc, not sure why CI didn't fire up here, but this is ready for a review. :)

@QuentinSoubeyran
Copy link
Contributor

Try to push a commit again to trigger the CI, perhaps ?

@samuelduchesne
Copy link
Author

Try to push a commit again to trigger the CI, perhaps ?

Still not working! Weird!

@QuentinSoubeyran
Copy link
Contributor

Well, I'm at a loss here... Have you run the tests locally using pytest ? Maybe the CI is straigh crashing on this PR, hence no info ?

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants