Skip to content

Add JupyterLite-powered interactive galleries to the scikit-image documentation#7644

Draft
agriyakhetarpal wants to merge 59 commits intoscikit-image:mainfrom
agriyakhetarpal:docs/add-interactive-examples
Draft

Add JupyterLite-powered interactive galleries to the scikit-image documentation#7644
agriyakhetarpal wants to merge 59 commits intoscikit-image:mainfrom
agriyakhetarpal:docs/add-interactive-examples

Conversation

@agriyakhetarpal
Copy link
Contributor

@agriyakhetarpal agriyakhetarpal commented Dec 23, 2024

Description

Closes #6958

This pull request aims to enable interactive documentation for scikit-image. Particularly, this relies on JupyterLite and jupyterlite-sphinx, particularly, our new 0.18.0 release – please see https://github.com/jupyterlite/jupyterlite-sphinx/releases/tag/v0.18.0. A list of the changes made in this PR is as follows:

Note

Another kernel, namely, the Xeus kernel is also available for use with JupyterLite (via pip install jupyterlite-xeus) – the difference is that it uses the emscripten-forge distribution instead of the Pyodide distribution noted above. I haven't used it because scikit-image seems to be packaged with version 0.19.3 there, which is a much older version than what is available at the time of writing.

Checklist

Release note

For maintainers and optionally contributors, please refer to the instructions on how to document this PR for the release notes.


Here's my attempt at a release note:

The `scikit-image` documentation now features JupyterLite, which is used to
enable interactivity. The notebooks under the example gallery can now be
executed using Pyodide, a Python distribution that runs in your browser, without
needing to install anything on your system. Please feel free to open an issue in
our GitHub repository if you face any issues.

Additional context

@agriyakhetarpal
Copy link
Contributor Author

There seems to be a bug where I cannot correctly build the "API reference" section of the documentation on a local build – I have just the skimage index and not the rest of the top-level modules; they aren't being generated fully. The sphinx-gallery examples work fine, though. I shall debug with the CircleCI previews for now...

I noticed another bug with sphinx-gallery's notebook paths – I will submit a patch there.

@agriyakhetarpal
Copy link
Contributor Author

I'm sorry, @lesteve – I should have checked that you've had an open PR to do this via #6974 before I started working on this initiative further. How would you want me to proceed? I am happy to either pull in the relevant commits from that branch I haven't implemented here and to grant you authorship as appropriate, or step back and let you finish that one. Now that scikit-image's WASM wheels are being built via #7350 (though not being tested temporarily because of #7576), my further plans for this PR were/are to work on #6958 to complete the full deployment (#6972 (comment)).

@lesteve
Copy link
Contributor

lesteve commented Jan 2, 2025

@agriyakhetarpal no worries and thanks for all the work you are doing on the Pyodide/interactive documentation front!

Feel free to do whatever is simpler for you! I am more than happy for #6974 to be closed and you taking over this work.

@lagru
Copy link
Member

lagru commented Jan 4, 2025

There seems to be a bug where I cannot correctly build the "API reference" section of the documentation on a local build – I have just the skimage index and not the rest of the top-level modules; they aren't being generated fully.

@agriyakhetarpal, I've seen that locally when I installed skimage in editable mode with spin install. I don't think your PR or related to JupyterLite. Sorry for the confusion on your part. I'll see if I can come up with a quick fix. It's very probably something with how our doc/tools/apigen.py not dealing well with the editable hook.

@lagru
Copy link
Member

lagru commented Jan 4, 2025

I proposed a fix in #7647 to make apigen.py work with editable hooks.

@agriyakhetarpal
Copy link
Contributor Author

@lesteve, I was able to move the CORS proxy handling to a skimage.data.patch_cors() function, continuing from #6911. This makes it have a much nicer interface, as we can replace the initialisation code that was exposed to the users via the notebooks previously, like the following

import re
import skimage.data._registry
new_registry_urls = {
    k: re.sub(
        r'https://gitlab.com/(.+)/-/raw(.+)',
        r'https://cdn.statically.io/gl/\1\2',
        url
    )
    for k, url in skimage.data._registry.registry_urls.items()
}
skimage.data._registry.registry_urls = new_registry_urls
    """.splitlines()
        )

with smaller code, as

import skimage.data
skimage.data.patch_cors()

It seems to work quite well. You can try it out here: "Segment human cells (in mitosis)" example — skimage 0.25.3rc0.dev0 documentation

Of course, we can't realistically test all example notebooks, as we discussed. However, I don't think there's a reason why this shouldn't work for any other cases besides this one. The PR is ready for another look!

@agriyakhetarpal agriyakhetarpal marked this pull request as ready for review February 20, 2025 23:49
@stefanv
Copy link
Member

stefanv commented Mar 7, 2025

@agriyakhetarpal Let me know if you want me to take a look at this. I wasn't sure of its current status.

@agriyakhetarpal
Copy link
Contributor Author

@agriyakhetarpal Let me know if you want me to take a look at this. I wasn't sure of its current status.

@stefanv, thanks! It's ready for review.

@agriyakhetarpal
Copy link
Contributor Author

Hi @stefanv, I hope it is okay to ping you again here – this PR stands ready for review. @lesteve recently noted in my contemporary PRs to scikit-learn at scikit-learn/scikit-learn#29791 and scikit-learn/scikit-learn#31078 that the approach I was using here wasn't the best one as it would end up bloating the documentation repository at https://github.com/scikit-image/docs/tree/gh-pages with binary WASM wheels, which I think we should avoid.

I'm now applying mostly the same code as scikit-learn/scikit-learn#31085, which requires #7440 to be merged so that the nightly WASM wheels for scikit-image can show up on https://anaconda.org/scientific-python-nightly-wheels (i.e., when you're able to get to this PR, please review that PR first). The recent commits I've pushed today are mostly in accordance with this change.

In particular, this PR also applies a patch to scikit-image's registry to use the https://statically.io/ CDN as a CORS proxy for data files that are hosted on GitLab. This means that examples requiring such functionality in ski.data won't work, but will after this PR makes it to the nightly wheels. Please let me know if you'd like me to split up the changes I've made to skimage/data/_fetchers.py to a new PR that should be reviewed/merged before this PR (I probably should).

Thank you for your time!

I'm also happy to hop on to another one of scikit-image's community calls to interact with you and the rest of the scikit-image core developers, similar to our previous discussions on #7470 (I'm waiting to get this change addressed before I proceed to work on that!).

Copy link
Member

@stefanv stefanv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @agriyakhetarpal! This looks pretty close to me. I only had a quick scan, but it's not 100% clear to me how we match the pip install version to the docs version in the non-dev gallery. Otherwise, I see no major hurdles.

'moon',
'nickel_solidification',
'page',
'palisades_of_vogt',
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a change necessary on main?

Copy link
Contributor Author

@agriyakhetarpal agriyakhetarpal Mar 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed that this was missing when I previously added a method here (which I've later reverted). This change is relevant but out of scope for this PR; will remove.

pass

# This code path modifies the registry_urls dictionary to route all
# GitLab resources through a CORS proxy (cdn.statically.io), making
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a problem with always routing via CDN, instead of just for pyodide?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My opinion is that we should avoid it. GitLab itself is using Cloudflare for its CDN services and can handle thrown on it, but switching to https://statically.io as a full-fledged and free CDN intended as a CORS proxy would put unnecessary strain on it. The JupyterLite docs would have much less traffic in comparison.

" index_urls='https://pypi.anaconda.org/scientific-python-nightly-wheels/simple',\n"
")",
]
if "dev" in version
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why os the version missing for non-dev pip install?

@agriyakhetarpal
Copy link
Contributor Author

agriyakhetarpal commented Mar 28, 2025

I only had a quick scan, but it's not 100% clear to me how we match the pip install version to the docs version in the non-dev gallery. Otherwise, I see no major hurdles.

Thanks for the review, @stefanv!

This PR was addressing this point and the one you made in #7644 (comment) up to the time when I was bundling the wheel along with the docs, which I removed in later commits as it would increase the size of the https://github.com/scikit-image/docs repository. At this moment, this PR shouldn't close #6958 as it adds a nightly installation of scikit-image for the dev docs when it gets merged; it doesn't sync up the versions for older docs.

I'd say that it is acceptable to add binary files to a repository like that and work with only a shallow clone if we need to modify the repository manually, but there's also a security angle towards adding them, which I cannot ignore. We'll need to figure out a solution for this. I have a few suggestions at hand:

I'm also curious if Loïc would have any suggestions, as this problem is one of the unsolved ones we have. :)

@lesteve
Copy link
Contributor

lesteve commented Mar 31, 2025

My feeling right now for scikit-learn: the JupyterLite/Pyodide setup is experimental. We had zero reports about people being confused because the scikit-learn version on JupyterLite did not match what they expected 😅, probably because not that many people click on the JupyterLite link and/or they don't know where to complain. In a similar fashion, the Binder scikit-learn had been broken for at least 2 years and there had been zero report about it 😅.

Personally, I sum this up as "it is completely fine to be less conservative than usual, let's try to get more people to use the JupyterLite setup, and let's improve it as we go". Of course I fully understand if other projects decide to be more conservative than scikit-learn.

What I envision medium-term for scikit-learn (a bit hacky but 🤷) is to upload Pyodide wheels to anaconda.org scientific-python-nightly-wheel for scikit-learn released versions, so that await micropip.install(f'scikit-learn=={version}', index_urls='https://pypi.anaconda.org/scientific-python-nightly-wheels/simple') will work as well for scikit-learn released versions. This would mean that the 1.6 doc uses scikit-learn 1.6.1 in its JupyterLite and the dev doc uses scikit-learn dev in its JupyterLite.

I think building the Pyodide wheel during the build and shipping it within JupyterLite (see doc) is a fine option. Amongst other things, this should make it easier to keep the same version in the JupyterLite environment and the example gallery. For scikit-learn, I was a bit worried about the impact on the scikit-learn.github.io repo and on complicating the doc build slightly but I may change my mind about this in the future, we will see ...

I haven't tried emscripten-forge recently but my understanding is that you do the equivalent of importing all the installed packages at kernel start-up time, which in my opinion doesn't lead to a super nice user experience 1. Having said this, I completely get it that opinions may differ about the trade-offs involved.

This is the first time I hear about pyodide-jupyter-lock so I am not going to comment on it 😅 ...

Footnotes

  1. For example: the user is waiting too much before being able to execute a simple cell like print('bla'). All packages are imported even if only a small part of the packages are used in a given notebook.

@agriyakhetarpal
Copy link
Contributor Author

@lesteve, @lagru, and I are here in person at the Scientific Python summit. Our goal is to have this PR merged within the next few days. On my radar:

@agriyakhetarpal
Copy link
Contributor Author

I'm marking this as a draft until #7931 is merged and a nightly Pyodide wheel is rereleased for usage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

⏩ type: Enhancement Improve existing features

Projects

None yet

Development

Successfully merging this pull request may close these issues.

JupyterLite docs link is not tied to docs version of skimage

5 participants