Skip to content

Use scipy.sparse arrays#7576

Merged
jarrodmillman merged 19 commits intoscikit-image:mainfrom
jarrodmillman:sparse_arrays
Nov 19, 2024
Merged

Use scipy.sparse arrays#7576
jarrodmillman merged 19 commits intoscikit-image:mainfrom
jarrodmillman:sparse_arrays

Conversation

@jarrodmillman
Copy link
Contributor

@jarrodmillman jarrodmillman commented Oct 7, 2024

Description

Checklist

Release note

For maintainers and optionally contributors, please refer to the instructions on how to document this PR for the release notes.

Switched to using the `scipy.sparse` array interface. For more details, see the note about the new `scipy.sparse` array interface [here](https://docs.scipy.org/doc/scipy/reference/sparse.html).

@jarrodmillman jarrodmillman added the 🔧 type: Maintenance Refactoring and maintenance of internals label Oct 7, 2024
@jarrodmillman jarrodmillman marked this pull request as draft October 7, 2024 03:48
@jarrodmillman jarrodmillman added this to the 0.26 milestone Oct 7, 2024
@jarrodmillman jarrodmillman modified the milestones: 0.26, 0.25 Oct 14, 2024
@jarrodmillman jarrodmillman marked this pull request as ready for review October 14, 2024 19:03
@jarrodmillman
Copy link
Contributor Author

jarrodmillman commented Oct 15, 2024

@stefanv @dschult Any thoughts about the error in this PR?

The main error is

 =========================== short test summary info ============================
FAILED graph/tests/test_rag.py::test_cut_normalized - assert 0 == 1
 +  where 0 = <built-in method max of numpy.ndarray object at 0x7fed54c69470>()
 +    where <built-in method max of numpy.ndarray object at 0x7fed54c69470> = array([[0, 0, 0, ..., 0, 0, 0],\n       [0, 0, 0, ..., 0, 0, 0],\n       [0, 0, 0, ..., 0, 0, 0],\n       ...,\n       [0, 0, 0, ..., 0, 0, 0],\n       [0, 0, 0, ..., 0, 0, 0],\n       [0, 0, 0, ..., 0, 0, 0]], dtype=uint8).max
= 1 failed, 8933 passed, 45 skipped, 89 xfailed, 2 xpassed, 1 warning in 256.41s (0:04:16) =

That looks like a bug to me, but I am not sure.

When we test against our minimum requirements we use scipy-1.10.0 and get errors like this as well:

E       AttributeError: module 'scipy.sparse' has no attribute 'diags_array'

because I changed

px_inv = sparse.diags(_invert_nonzero(px))

to

px_inv = sparse.diags_array(_invert_nonzero(px))

What is the recommend upgrade path for projects that want to switch to sparse arrays, but plan to support scipy 1.10 and above?

We are also getting errors like this with scipy-1.10.0:

 >               raise ValueError("inconsistent shapes")
E               ValueError: inconsistent shapes

other      = <3370x6998 sparse matrix of type '<class 'numpy.float64'>'
	with 13909 stored elements in Compressed Sparse Row format>
self       = <6998x6998 sparse array of type '<class 'numpy.float64'>'
	with 34642 stored elements in Compressed Sparse Row format>

/opt/hostedtoolcache/Python/3.10.15/x64/lib/python3.10/site-packages/scipy/sparse/_compressed.py:416: ValueError

@jarrodmillman
Copy link
Contributor Author

jarrodmillman commented Oct 15, 2024

According to https://scipy.github.io/devdocs/reference/generated/scipy.sparse.diags_array.html#scipy.sparse.diags_array diags_array was added in 1.11. But I couldn't find it in the documentation for 1.11. I went searching and it looks like I may have taken it out: scipy/scipy#18599

All the constructors seem to have been introduced in 1.12. But only diags_array has an "Added in version" notice (and it is wrongly states 1.11).

@stefanv
Copy link
Member

stefanv commented Oct 15, 2024

Dan just submitted an update to the docs, I don't think it fixes the version, but dia_array can often be instantiated directly, instead of using diags.

@jarrodmillman
Copy link
Contributor Author

What should we do about the dtype issue?

@stefanv
Copy link
Member

stefanv commented Oct 15, 2024

Dan figured out that we need to ensure that the index dtype is 32-bit for pyamg. Currently, dtype is inferred from type used in constructor, I believe, without a way to make it explicit. Bad :) There is an internal method to use to fix that. Hopefully I can get to it on Friday, or maybe Dan is feeling generous before then.

@dschult
Copy link
Contributor

dschult commented Oct 16, 2024

I believe diags_array was part of the _construct.py rewrite that occurred after 1.11. So it probably didn't get into sparse until 1.12. But we can use dia_array instead if we're careful. The difference is that dia_array pads the data values so that the length of all diagonals is the same. That way it can be held in a single ndarray instead of a sequence of 1D-arrays with different lengths like with diags_array.

The calls to amg_core need to have indices and indptr be int32. So one way to handle it is to downcast just before those calls.

Can you give me an overview of the goals or intents and scope of this PR? The OP looks anemic. :) What are we working toward?

@dschult
Copy link
Contributor

dschult commented Oct 16, 2024

I made a PR to @jarrodmillman 's branch. Downcasting before sending to pyamg seems to "work", but inside pyamg they are converting back to csr_matrix, and they warn that it might not be efficient. That's cause they identify that it has CSR format using isspmatrix_csr which only finds spmatrix... not sparray. So, clearly some work is needed in pyamg too.

@jarrodmillman
Copy link
Contributor Author

@dschult Thanks!! It looks like your PR only fixes the CI when pyamg is installed (i.e., linux-cp3.12-optional-deps). In the other tests, whne pyamg is not installed, we are getting same error as before.

Copy link
Contributor

@dschult dschult left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To fix the lack of diags_array in earlier versions of scipy, let's try constructing the dia_array directly. It's just the main diagonal so shouldn't mess up length of the diagonals (which are treated differently for diags_array and dia_array when offset != 0). (untested)

@dschult
Copy link
Contributor

dschult commented Oct 17, 2024

It looks like the only failing test now is test_cut_normalized. And as far as I can see it has not worked all the way along.

I am getting different results locally when I run it. It looks like the rag graph is a complete graph on the 4 regions. And that makes sense with the result that no cut occurs and they are all in the same group. But sometimes the result is all 1s and sometimes all 0s. I thought maybe it was the in_place argument allowing changes from one call to affect another call, but that's not it.

Why would the result change from one run to another? Is there roundoff error or randomness involved?

@dschult
Copy link
Contributor

dschult commented Oct 18, 2024

Looks like the cut_normalized error is due to a matrix multiplication that was still using * instead of @ so it became element-wise multiplication after the conversion to sparray.

skimage/graph/_graph_cut.py:281

It looks like there might be others: git grep '[ a-zA-Z_]\*[ a-zA-Z]' skimage/graph
(and maybe throughout the whole library -- at least where spmatrix was being used).

Step 1 of the magration guide is probably the hardest. Can we create a tool that runs code/tests and raises whenever * is used between two spmatrix? We can make a version of scipy with that trait, but is there a quick-and-dirty way to flag *?

@jarrodmillman jarrodmillman force-pushed the sparse_arrays branch 2 times, most recently from 1827a90 to 2cc3ebb Compare October 18, 2024 16:32
@dschult
Copy link
Contributor

dschult commented Oct 18, 2024

Looks like another * vs @ trouble -- this time inside pyamg.
And the Pyiodide error seems to be related to the float16 dtype which scipy does not support.
It seems like that would be a problem for spmatrix though, so I don't know how this was passing before this PR.
Clearly I don't understand something. :)

I have created a way to look for places where * is likely to need changing. It can be used with any script, but I'm thinking of suggesting that it be used inside conftest.py so you run all the tests and it reports any time a spmatrix gets into __mul__ and other is not a scalar. (That's got to be matrix multiplication.) It also checks __rmul__, __imul__ and __pow__. The idea is to add this code to conftest.py near the top and then run all your tests. The failures tell you where you are using * with a spmatrix. Does this help with skimage? (It shouldn't come up with any in this PR because you changed spmatrix to sparray -- but you could use it with main to see where the places are to check.) And of course it can show you the cases in another package like pyamg, but changing those is harder.

Any suggestions/thoughts are appreciated. I'm thinking of adding it to the migration guide along with possibly helpful git grep expressions. It is really the "pass 1" changes that are hard. Replacing old idioms, index dtypes and switching from * to @ are the tricky parts (that I've run into anyway).

Code to find spmatrix ops
import scipy


class _strict_mul_mixin:
    def __mul__(self, other):
        if not scipy.sparse._sputils.isscalarlike(other):
            raise ValueError('Operator * used here! Change to @?')
        return super().__mul__(other)

    def __rmul__(self, other):
        if not scipy.sparse._sputils.isscalarlike(other):
            raise ValueError('Operator * used here! Change to @?')
        return super().__rmul__(other)
        
    def __imul__(self, other):
        if not scipy.sparse._sputils.isscalarlike(other):
            raise ValueError('Operator * used here! Change to @?')
        return super().__imul__(other)
        
    def __pow__(self, *args, **kwargs):
        raise ValueError('spmatrix ** found! Use linalg.matrix_power?')

class _strict_coo_matrix(_strict_mul_mixin, scipy.sparse.coo_matrix):
    pass

class _strict_bsr_matrix(_strict_mul_mixin, scipy.sparse.bsr_matrix):
    pass

class _strict_csr_matrix(_strict_mul_mixin, scipy.sparse.csr_matrix):
    pass

class _strict_csc_matrix(_strict_mul_mixin, scipy.sparse.csc_matrix):
    pass

class _strict_dok_matrix(_strict_mul_mixin, scipy.sparse.dok_matrix):
    pass

class _strict_lil_matrix(_strict_mul_mixin, scipy.sparse.lil_matrix):
    pass

class _strict_dia_matrix(_strict_mul_mixin, scipy.sparse.dia_matrix):
    pass

scipy.sparse.coo_matrix = scipy.sparse._coo.coo_matrix = _strict_coo_matrix
scipy.sparse.bsr_matrix = scipy.sparse._bsr.bsr_matrix = _strict_bsr_matrix
scipy.sparse.csr_matrix = scipy.sparse._csr.csr_matrix = _strict_csr_matrix
scipy.sparse.csc_matrix = scipy.sparse._csc.csc_matrix = _strict_csc_matrix
scipy.sparse.dok_matrix = scipy.sparse._dok.dok_matrix = _strict_dok_matrix
scipy.sparse.lil_matrix = scipy.sparse._lil.lil_matrix = _strict_lil_matrix
scipy.sparse.dia_matrix = scipy.sparse._dia.dia_matrix = _strict_dia_matrix

@jarrodmillman jarrodmillman force-pushed the sparse_arrays branch 2 times, most recently from 1e8ec1e to 9f5d6b2 Compare October 22, 2024 16:08
# Don't use the cache provider plugin, as it doesn't work with Pyodide
# right now: https://github.com/pypa/cibuildwheel/issues/1966
pytest -svra -p no:cacheprovider --pyargs skimage
# pytest -svra -p no:cacheprovider --pyargs skimage
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will be re-enabled in a follow-up PR.

@jarrodmillman jarrodmillman marked this pull request as draft October 22, 2024 19:22
@jarrodmillman jarrodmillman marked this pull request as ready for review November 19, 2024 02:34
@jarrodmillman jarrodmillman merged commit 14540c8 into scikit-image:main Nov 19, 2024
@lagru
Copy link
Member

lagru commented Nov 19, 2024

Just curious why you chose to use a merge commit instead of squashing here? The PR commit history doesn't seem particularly useful to preserve?

@stefanv
Copy link
Member

stefanv commented Jan 9, 2025

Just curious why you chose to use a merge commit instead of squashing here? The PR commit history doesn't seem particularly useful to preserve?

GH remembers the default, so it's easy to accidentally revert to one or the other. Oh well.

lagru added a commit to lagru/scikit-image that referenced this pull request Jan 31, 2025
Leaving this commented is probably an oversight from scikit-imagegh-7576 /
a7b54a8.
lagru added a commit that referenced this pull request Jan 31, 2025
* Use pytest config in pyroject.toml in CI

This should be safe, as the pytest documentation [1] states that
"rootdir is NOT used to modify sys.path/PYTHONPATH". So while the
`--config-file` option affects the `rootdir`, it's not adding that
directory to the PYTHONPATH where it could accidentally overwrite the
installed package. This solution is somewhat unsatisfying. Personally,
I think a better approach would be to switch to the src-layout.

[1] https://docs.pytest.org/en/7.1.x/reference/customize.html?highlight=rootdir#initialization-determining-rootdir-and-configfile

* Use correct operator to check if GITHUB_WORKSPACE exists

* Ignore missing dask dependency in cycle_spin

* Don't expect newbyteorder warning caused by tifffile

I believe this warning disappeared with tifffile 2024.4.24 which added
support for NumPy 2.

* Ignore warning if cg_mg is not available (missing pyamg)

* Update test scripts in new locations

* Pin higher version of pytest

* Pin newer dask to get rid of distutils warning

* Expect sparse efficiency warnings until new pyamg

* New dask version requires newer cloudpickle

* Tell pytest where to find SparseEfficiencyWarning

* Catch additional warnings

* Up minimum version of SciPy to fix sparse errors

* Catch additional warning

* Ignore warnings raised by pyamg internally

* Use a reasonable range in simpleitk roundtrip test

* Re-enable testing in Emscripten/Pyodide job

Leaving this commented is probably an oversight from gh-7576 /
a7b54a8.

---------

Co-authored-by: Stefan van der Walt <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

📜 type: API Involves API change(s)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants