Skip to content

Shift to using numpy.random.Generator and harmonize signature of seed across all sample functions#662

Merged
ConnectedSystems merged 34 commits intoSALib:mainfrom
quaquel:rng
Oct 12, 2025
Merged

Shift to using numpy.random.Generator and harmonize signature of seed across all sample functions#662
ConnectedSystems merged 34 commits intoSALib:mainfrom
quaquel:rng

Conversation

@quaquel
Copy link
Copy Markdown
Contributor

@quaquel quaquel commented Sep 8, 2025

This PR updates all sample functions to accept an integer, None, or a numpy.random.Generator instance as an argument for seed. It also ensures that this generator is used throughout the code to ensure reproducibility of the results.

I have kept seed as the name of the keyword argument. For spec-7 compliance, this should be renamed to rng. However, changing this would be backward incompatible. Of course, it would be easy to add it in here and, for the time being, accept either seed or rng and raise a deprecation warning on seed. If that is desirable, this is easy to add.

See #634 for some background

@EwoutH
Copy link
Copy Markdown

EwoutH commented Sep 10, 2025

Without having reviewed the code line-by-line, this looks like a proper step towards standardizing seed passing and SPEC 7 compliance. It's nice it can be done without breaking backwards compatibility.

@quaquel
Copy link
Copy Markdown
Contributor Author

quaquel commented Sep 10, 2025

The unit tests fail before the actual testing starts.

  • 3.9 errors on setting up conda with Did not find Mambaforge-Linux-x86_64.sh latest in cache
  • 3.12 errors on setting up conda with a 404 error.

Linter caught a couple of small issues with morris, which are now fixed. The other linter issue is that seed/rng is present in the signature of full factorial and finite difference, but not used internally by either method. I cannot really fix this. The issue was there before this PR already.

@ConnectedSystems
Copy link
Copy Markdown
Member

Thanks @quaquel

Linter caught a couple of small issues with morris, which are now fixed. The other linter issue is that seed/rng is present in the signature of full factorial and finite difference, but not used internally by either method. I cannot really fix this. The issue was there before this PR already.

SALib methods have to define a seed/rng argument to maintain compatibility with the CLI.
There are ways to handle methods that do not require a seed value at all of course, but it's out of scope here I think.

Could anyone point me to a draft/interim SPEC7 document? As far as I can tell it's not yet finalised but I want to see how far we are in alignment with current discussions at least.

@ConnectedSystems
Copy link
Copy Markdown
Member

@tupui

Did scipy change the interface for scipy.stats.uniform recently?

Looks like we had been running scipy.stats.uniform(loc=sample * d, scale=sample * d, size=num_groups, seed=<some value>) for years, but now it doesn't work (seed argument not supported).

Looks like the correct usage is now scipy.stats.uniform.rvs(loc=sample * d, scale=sample * d, size=num_groups)

But now all tests that check results within a certain tolerance (for LHS sampling) appear to fail?

@tupui
Copy link
Copy Markdown
Member

tupui commented Sep 14, 2025

Mmm there was some changes to use rng instead of seed. I am surprised it's breaking without warning though.

But in general for basic distributions there is the new API which is recommended because better in every way.

1 similar comment
@tupui
Copy link
Copy Markdown
Member

tupui commented Sep 14, 2025

Mmm there was some changes to use rng instead of seed. I am surprised it's breaking without warning though.

But in general for basic distributions there is the new API which is recommended because better in every way.

@tupui
Copy link
Copy Markdown
Member

tupui commented Sep 14, 2025

@quaquel
Copy link
Copy Markdown
Contributor Author

quaquel commented Sep 14, 2025

First, thanks for the commits and minor modifications.

SALib methods have to define a seed/rng argument to maintain compatibility with the CLI.
There are ways to handle methods that do not require a seed value at all of course, but it's out of scope here I think.

Yes, let's keep this out of scope for this PR. And indeed it is relatively easy to fix. I would suggest checking the validity of the CLI arguments for each method, rather than having a common interface with arguments that for certain methods don't matter.

Could anyone point me to a draft/interim SPEC7 document? As far as I can tell it's not yet finalised but I want to see how far we are in alignment with current discussions at least.

SPEC-7 is endorsed by ipython, networkx, numpy, scikit-image, and scipy, so I guess it has been finalized. The only discrepancy between this PR and spec-7 is that this PR still uses seed instead of rng as recommended by SPEC-7. Apart from that, everything is compliant. The simplest ways to make this PR fully compliant, without breaking backward compatibility, would be to:

  1. Accept either seed or rng.
  2. Raise a DeprecationWarning on seed and set rng = seed.
  3. Raise a ValueError if both seed and rng are passed.

so , something like the below should do the trick.

import functools
import warnings

def handle_seed_rng(func):
    @functools.wraps(func)
    def wrapper_handle_seed_rng(*args, **kwargs):
        seed = kwargs.pop("seed", False)
        rng = kwargs.pop("rng", False)

        if seed is not False and rng is not False:
            raise ValueError("please use only rng")
        if seed is not False:
            warnings.warn("the use of seed as a keyword argument is being deprecated, use rng instead",
                             DeprecationWarning,
                             stacklevel=2)
            rng = seed
        elif rng is False:
            rng = None

        return func(*args, **kwargs, rng=rng)
    return wrapper_handle_seed_rng

It would be trivial to embed this logic in a function decorator and just decorate all sample functions with it. Once the use of seed is fully deprecated, just remove the decorator and you're done.

But now all tests that check results within a certain tolerance (for LHS sampling) appear to fail?

Not sure what's happening on the tests. I ran all tests locally before creating the PR, and everything was fine on my end.

in line with numpy docs, Generator.spawn is an equivalent short hand for the complicated code I drew on from scipy
@quaquel
Copy link
Copy Markdown
Contributor Author

quaquel commented Sep 14, 2025

I have addressed a whole bunch of failing tests here that all had to do with the proper handling of the rng keyword argument. I also realized that analyze/morris was still using numpy.random instead of a Generator. So that is also addressed.

The remaining failing tests are all assertion errors because the results return from the updated code do not match what is expected. It is very hard to diagnose this for me, because I have no idea where these numbers against which the new results are compared came from.

It might be that in many cases it's just due to a shift from numpy.random.seed to numpy.random.default_generator. That is, the random number generating algorithm is different from what it was before. To wit, the former uses mersene twister as the random number generating algorithm, while Generator uses PCG64.

@quaquel
Copy link
Copy Markdown
Contributor Author

quaquel commented Sep 14, 2025

@tupui

Did scipy change the interface for scipy.stats.uniform recently?

Looks like we had been running scipy.stats.uniform(loc=sample * d, scale=sample * d, size=num_groups, seed=<some value>) for years, but now it doesn't work (seed argument not supported).

Looks like the correct usage is now scipy.stats.uniform.rvs(loc=sample * d, scale=sample * d, size=num_groups)

But now all tests that check results within a certain tolerance (for LHS sampling) appear to fail?

The code on main uses numpy.random.uniform, I rewrote it to use scipy.stats, because this is compatible with the new generators (but made a mistake in doing so, which I fixed).

@quaquel
Copy link
Copy Markdown
Contributor Author

quaquel commented Sep 14, 2025

I have done some further cleaning and reverted some changes in the analyze functions. At present, quite a few of the analyze functions also use np.random.seed. While fixing failing unit tests, I started to change this as well, but that made this PR too big and unwieldy.

I now have the following failing tests

FAILED tests/sample/morris/test_morris.py::TestGroupSampleGeneration::test_generate_x_star - AssertionError: 
FAILED tests/sample/test_latin.py::TestLatinSample::test_latin_sample_trivial - AssertionError: 
FAILED tests/sample/test_latin.py::TestLatinSample::test_latin_sample_trivial_group - AssertionError: 
FAILED tests/sample/test_latin.py::TestLatinSample::test_latin_sample_one_group - AssertionError: 
FAILED tests/sample/test_latin.py::TestLatinSample::test_latin_sample_two_groups - AssertionError: 
FAILED tests/test_cli_analyze.py::test_fast - AssertionError: Unexpected FAST results.
FAILED tests/test_cli_analyze.py::test_morris - AssertionError: Results did not match expected values:
FAILED tests/test_cli_analyze.py::test_morris_scaled - AssertionError: Results did not match expected values:
FAILED tests/test_regression.py::TestMorris::test_regression_morris_vanilla - AssertionError: 
FAILED tests/test_regression.py::TestMorris::test_regression_morris_scaled - AssertionError: 
FAILED tests/test_regression.py::TestMorris::test_regression_morris_groups - AssertionError: 
FAILED tests/test_regression.py::TestMorris::test_regression_morris_groups_brute_optim - AssertionError: 
FAILED tests/test_regression.py::TestMorris::test_regression_morris_optimal - AssertionError: 
FAILED tests/test_util.py::test_nonuniform_scale_samples_truncnorm - AssertionError: 

As far as I can tell, having checked each of these tests, these assertions all fail because they test for specific return values. However, since the algorithm for the random number generation has changed, the number evidently also will change, so it's not surprising that these tests are failing.

Please let me know how you want me to continue with this PR.

@ConnectedSystems
Copy link
Copy Markdown
Member

ConnectedSystems commented Sep 15, 2025

Thanks @quaquel , I'll need a bit more time to go through those tests. Some have existed before my time on project.

@willu47 would you have any recollection which tests are based on known and expected values, and which are based on what results were being obtained with Mersenne Twister RNG? If they are all the latter, then the resolution is simply updating with values from the new RNG I guess.

Testing against manually run values as simple indication that no regression has occurred.
@ConnectedSystems
Copy link
Copy Markdown
Member

Updated tests - the values used appear to have been extracted manually and used as a simple indicator that nothing has changed unexpectedly.

Next efforts would be to test that values are within the estimated CIs rather than ensure outputs are identical.

Tests for Python 3.9 is failing due to some issue with Conda. I can confirm that all works fine locally however, so will be merging this in shortly.

Removing trailing comma
@ConnectedSystems ConnectedSystems merged commit 55e7079 into SALib:main Oct 12, 2025
1 of 4 checks passed
@quaquel
Copy link
Copy Markdown
Contributor Author

quaquel commented Oct 12, 2025

thanks for finalizing this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants