Skip to content

Run scipy tests as part of the Github Action CI#4935

Merged
hoodmane merged 18 commits intopyodide:mainfrom
lesteve:full-scipy-tests
Jul 17, 2024
Merged

Run scipy tests as part of the Github Action CI#4935
hoodmane merged 18 commits intopyodide:mainfrom
lesteve:full-scipy-tests

Conversation

@lesteve
Copy link
Contributor

@lesteve lesteve commented Jul 15, 2024

Description

It would make a lot of sense to have Pyodide developers be able to run Scipy tests rather than asking me to do it manually based on https://github.com/lesteve/scipy-tests-pyodide when needed. Last time this happened is for the f2c fork PR #4920 (comment). You could imagine that the Scipy 1.13 PR #4719 would benefit from this as well as potentially many other PRs.

I am testing this on my fork, see GH logs from my PR branch in particular this GH log where the scipy tests ran without error.

This is a bit of a draft PR as there are a few questions:

  1. when do we want to trigger the full Scipy tests? There was a Hood's comment that suggested when some files changes in a few folders e.g. scipy, f2c, etc .... do we want to always run it on main? Do we want a commit-based trigger like [full-scipy] (in scikit-learn we use them for a few things, like forcing a full doc-build, running the Pyodide CI in a PR, etc ...)
  2. this would simplify my life if it was OK to to always build scipy in the "core" build (I would estimate maybe 15-20 minutes added to build time) ... tests would only be run on some specific triggers. Right now, I separated scipy build from the core build but right now it is compiling everything again. I guess because I need to add more things in the cache although I don't know exactly which ones (emsdk, cpython, packages at the very least). I guess CircleCI does that with workspaces.
  3. test strategy is the same as the one I currently use in https://github.com/lesteve/scipy-tests-pyodide, i.e. run with node and a js wrapper script. One of the reason of still having this wrapper script is that I have some work-arounds to remove some Scipy test files whose import is problematic.

Side-comment: I am using -m 'not slow' because otherwise I saw some weird memory issue towards the end of Scipy tests, see this build log where some tests can not allocate enough memory for the numpy arrays. Locally I get a Pyodide fatal error at 97% of the test, not sure whether this is related.

One of the error
 _______________ TestGoodnessOfFit.test_against_anderson_gumbel_r _______________

self = <scipy.stats.tests.test_fit.TestGoodnessOfFit object at 0xeda4d90>

    @pytest.mark.slow
    def test_against_anderson_gumbel_r(self):
        rng = np.random.default_rng(7302761058217743)
        # c that produced critical value of statistic found w/ root_scalar
        x = stats.genextreme(0.051896837188595134, loc=0.5,
                             scale=1.5).rvs(size=1000, random_state=rng)
>       res = goodness_of_fit(stats.gumbel_r, x, statistic='ad',
                              random_state=rng)

/lib/python3.12/site-packages/scipy/stats/tests/test_fit.py:849: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
/lib/python3.12/site-packages/scipy/stats/_fit.py:1138: in goodness_of_fit
    res = stats.monte_carlo_test(data, rvs, statistic_fun, vectorized=True,
/lib/python3.12/site-packages/scipy/_lib/_util.py:779: in wrapper
    return fun(*args, **kwargs)
/lib/python3.12/site-packages/scipy/stats/_resampling.py:910: in monte_carlo_test
    null_distribution.append(statistic(*resamples, axis=-1))
/lib/python3.12/site-packages/scipy/stats/_fit.py:1136: in statistic_fun
    return compare_fun(rfd_dist, data)
/lib/python3.12/site-packages/scipy/stats/_fit.py:1217: in _anderson_darling
    Si = (2*i - 1)/n * (dist.logcdf(x) + dist.logsf(x[..., ::-1]))
/lib/python3.12/site-packages/scipy/stats/_distn_infrastructure.py:499: in logsf
    return self.dist.logsf(x, *self.args, **self.kwds)
/lib/python3.12/site-packages/scipy/stats/_distn_infrastructure.py:2207: in logsf
    place(output, cond, self._logsf(*goodargs))
/lib/python3.12/site-packages/scipy/stats/_distn_infrastructure.py:1007: in _logsf
    return log(self._sf(x, *args))
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <scipy.stats._continuous_distns.gumbel_r_gen object at 0x1f2111b0>
x = array([11.92907778,  9.6840553 ,  6.65586868, ..., -1.75948008,
       -1.77058159, -1.92686803])

    def _sf(self, x):
>       return -sc.expm1(-np.exp(-x))
E       numpy.core._exceptions._ArrayMemoryError: Unable to allocate 76.3 MiB for an array with shape (9999000,) and data type float64

/lib/python3.12/site-packages/scipy/stats/_continuous_distns.py:4024: MemoryError

Checklists

I am guessing none of the checklist items apply here so I ticked all of them ...

  • Add a CHANGELOG entry
  • Add / update tests
  • Add new / update outdated documentation

@agriyakhetarpal
Copy link
Member

Hi, @lesteve, thanks! I may not have the full context of how intensive the tests can be for CI, as others do. Still, I think responding to the workflow_dispatch: event and, therefore, running the workflow manually through the Actions tab would be a good idea since not all PRs will update things for SciPy or push relevant changes. This way, the workflow won't run for PRs through forks... but we can always ask the author of a relevant PR to trigger it on their fork and paste a link to their logs. Running from one's fork would also help avoid straining CI resources for the Pyodide core repository.

@ryanking13
Copy link
Member

Thanks @lesteve! I really needed this :)

Do we want a commit-based trigger like [full-scipy]

I prefer this one.

this would simplify my life if it was OK to to always build scipy in the "core" build.

I think it is okay to do that. We don't build all packages in GHA, as we already build everything in CricleCI. But adding scipy seems reasonable to me. The situation will change if we unvendor recipes later anyways (#4918).

@lesteve
Copy link
Contributor Author

lesteve commented Jul 15, 2024

OK thanks for your inputs, in b2ef91a I now build scipy in the "core" build.

I will look at adding a trigger based on commit message next.

@hoodmane
Copy link
Member

Thanks @lesteve!

@lesteve
Copy link
Contributor Author

lesteve commented Jul 16, 2024

Summary:

  • precommit.ci is unhappy but this should be fixed by Update codespell version in pre-commit #4938 I think
  • right now test-scipy is triggered on push and on PR commits if [scipy] is in the commit message. I can change the commit marker as you wish, let me know if you have suggestions! It feels like triggering on push is a good idea because a PR can be merged without thinking it may break scipy tests but then you have test-scipy run on main so you have a chance to notice.
  • the commit based trigger inside a PR seems to be working: test-scipy is run if [scipy] is in the commit message in 3e93863 build URL test-scipy is skipped in bbd36ec build URL.
  • this roughly adds 15 minutes to build-core on ubuntu-latest (from ~15 minutes to ~30 minutes). Edit: in my last commit 46371cd the scipy build is only triggered when needed.
  • test-scipy is an additional 20 minutes when triggered

@hoodmane
Copy link
Member

I'll go ahead and merge this since the firefox test is a flake and pre-commit is hopefully fixed on main by #4938.

@hoodmane
Copy link
Member

Thanks @lesteve!

@lesteve
Copy link
Contributor Author

lesteve commented Jul 16, 2024

Yeah pre-commit.ci is green, I figured that the issue with codespell was actually to use small letters in the ignore file, oh well ...

At least the last commit is a good additional sanity check to show that when the [scipy] marker is not in the commit message, scipy is not built and the scipy tests is skipped. Edit it does indeed: see overview and build-core job log

Copy link
Member

@ryanking13 ryanking13 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @lesteve!

@lesteve
Copy link
Contributor Author

lesteve commented Jul 16, 2024

The CircleCI build-libraries step timed out after 1 hour in my previous commit see build log, this seems to happen in main as well recently: see this.

At least the scipy tests pass after having merged main, see build log

@hoodmane
Copy link
Member

Scipy tests pass so I'll merge it. Thanks again @lesteve!

@hoodmane hoodmane merged commit d471855 into pyodide:main Jul 17, 2024
@lesteve lesteve deleted the full-scipy-tests branch July 17, 2024 07:53
@lesteve
Copy link
Contributor Author

lesteve commented Jul 17, 2024

Great, thanks!

Quite happy that I finally took the time to contribute this and that this can be used by others!

Except a few quirks, it was almost 100% fun 😉

@hoodmane
Copy link
Member

Thanks so much for your help with scipy! It makes my life a lot easier and I think we are getting better outcomes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants