CI: Use GCP-backed kernel cache in Windows CI by kmaehashi · Pull Request #9738 · cupy/cupy

kmaehashi · 2026-02-21T10:19:04Z

Part of #9665. Based on #9737, this PR introduces GCP-backed kernel cache to Windows CI, which will be activated if CUPY_CI_ENABLE_GCP_KERNEL_CACHE=1 env var is set when invoking pytest.

Disclosure: The initial implementation was done by Copilot in kmaehashi#84.

Observations:

First run (no cache available): https://ci.preferred.jp/cupy.win.cuda130/210536/ -> timed out around 38%
Second run: https://ci.preferred.jp/cupy.win.cuda130/210537/ -> timed out around 65%
Third run: https://ci.preferred.jp/cupy.win.cuda130/210538/ -> All tests completed (ie cache fully populated)
Fourth run: https://ci.preferred.jp/cupy.win.cuda130/210543/ -> All tests completed in 2h13m. (roughly 13m for build, 2h for unit test)

kmaehashi · 2026-02-21T10:20:23Z

/test windows,cuda130

This will take 6 hours and likely results in timeout. The second invocation should be faster (I hope).

leofang · 2026-02-22T01:10:13Z

/test windows,cuda130

kmaehashi · 2026-02-22T04:23:00Z

(I forgot to do pip install google-cloud-storage...)

/test windows,cuda130

kmaehashi · 2026-02-22T04:45:28Z

(Confirmed that kernel cache started to appear in the GCS bucket.)

kmaehashi · 2026-02-22T09:52:34Z

/test windows,cuda130

leofang · 2026-02-22T15:40:51Z

/test windows,cuda130

kmaehashi · 2026-02-23T02:01:37Z

/test windows,cuda130

kmaehashi · 2026-02-23T04:50:54Z

This looks effective. In the fourth run (i.e. cache is fully populated in previous runs), the build + unit test run completed in 2h13m.

kmaehashi · 2026-02-24T09:01:14Z

/test windows,cuda130

kmaehashi · 2026-02-26T04:21:02Z

The CI failure should be fixed by the latest cuda-pathfinder (v1.4.0)

/test windows,cuda130

leofang · 2026-02-27T07:54:15Z

There are 4 test failures. I'll check tomorrow if we should just skip the tests when the GCP backend is in use (because we'd never be able to change the cache dir locally with use_temporary_cache_dir()).

================================== FAILURES ===================================
___ TestRaw_param_0_{backend='nvrtc', in_memory=False}.test_compile_kernel ____

self = <<cupy_tests.core_tests.test_raw.TestRaw_param_0_{backend='nvrtc', in_memory=False} testMethod=test_compile_kernel>  parameter: {'backend': 'nvrtc', 'in_memory': False}>

    @unittest.skipUnless(not cupy.cuda.runtime.is_hip,
                         'only CUDA raises warning')
    @pytest.mark.thread_unsafe(reason="mutates global cache directory")
    def test_compile_kernel(self):
        kern = cupy.RawKernel(
            _test_compile_src, 'test_op',
            options=('-DOP=+',),
            backend=self.backend,
            jitify=self.jitify)
        log = io.StringIO()
        with use_temporary_cache_dir():
            kern.compile(log_stream=log)
>       assert 'warning' in log.getvalue()
E       AssertionError: assert 'warning' in ''
E        +  where '' = <built-in method getvalue of _io.StringIO object at 0x0000016836E8F6D0>()
E        +    where <built-in method getvalue of _io.StringIO object at 0x0000016836E8F6D0> = <_io.StringIO object at 0x0000016836E8F6D0>.getvalue

cupy_tests\core_tests\test_raw.py:1089: AssertionError
___ TestRaw_param_0_{backend='nvrtc', in_memory=False}.test_compile_module ____

self = <<cupy_tests.core_tests.test_raw.TestRaw_param_0_{backend='nvrtc', in_memory=False} testMethod=test_compile_module>  parameter: {'backend': 'nvrtc', 'in_memory': False}>

    @unittest.skipUnless(not cupy.cuda.runtime.is_hip,
                         'only CUDA raises warning')
    @pytest.mark.thread_unsafe(reason="mutates global cache directory")
    def test_compile_module(self):
        module = cupy.RawModule(
            code=_test_compile_src,
            backend=self.backend,
            options=('-DOP=+',),
            jitify=self.jitify)
        log = io.StringIO()
        with use_temporary_cache_dir():
            module.compile(log_stream=log)
>       assert 'warning' in log.getvalue()
E       AssertionError: assert 'warning' in ''
E        +  where '' = <built-in method getvalue of _io.StringIO object at 0x0000016836E8F7F0>()
E        +    where <built-in method getvalue of _io.StringIO object at 0x0000016836E8F7F0> = <_io.StringIO object at 0x0000016836E8F7F0>.getvalue

cupy_tests\core_tests\test_raw.py:1105: AssertionError
____ TestRaw_param_3_{backend='nvcc', in_memory=False}.test_compile_kernel ____

self = <<cupy_tests.core_tests.test_raw.TestRaw_param_3_{backend='nvcc', in_memory=False} testMethod=test_compile_kernel>  parameter: {'backend': 'nvcc', 'in_memory': False}>

    @unittest.skipUnless(not cupy.cuda.runtime.is_hip,
                         'only CUDA raises warning')
    @pytest.mark.thread_unsafe(reason="mutates global cache directory")
    def test_compile_kernel(self):
        kern = cupy.RawKernel(
            _test_compile_src, 'test_op',
            options=('-DOP=+',),
            backend=self.backend,
            jitify=self.jitify)
        log = io.StringIO()
        with use_temporary_cache_dir():
            kern.compile(log_stream=log)
>       assert 'warning' in log.getvalue()
E       AssertionError: assert 'warning' in ''
E        +  where '' = <built-in method getvalue of _io.StringIO object at 0x0000016836E8F880>()
E        +    where <built-in method getvalue of _io.StringIO object at 0x0000016836E8F880> = <_io.StringIO object at 0x0000016836E8F880>.getvalue

cupy_tests\core_tests\test_raw.py:1089: AssertionError
____ TestRaw_param_3_{backend='nvcc', in_memory=False}.test_compile_module ____

self = <<cupy_tests.core_tests.test_raw.TestRaw_param_3_{backend='nvcc', in_memory=False} testMethod=test_compile_module>  parameter: {'backend': 'nvcc', 'in_memory': False}>

    @unittest.skipUnless(not cupy.cuda.runtime.is_hip,
                         'only CUDA raises warning')
    @pytest.mark.thread_unsafe(reason="mutates global cache directory")
    def test_compile_module(self):
        module = cupy.RawModule(
            code=_test_compile_src,
            backend=self.backend,
            options=('-DOP=+',),
            jitify=self.jitify)
        log = io.StringIO()
        with use_temporary_cache_dir():
            module.compile(log_stream=log)
>       assert 'warning' in log.getvalue()
E       AssertionError: assert 'warning' in ''
E        +  where '' = <built-in method getvalue of _io.StringIO object at 0x0000016791682DD0>()
E        +    where <built-in method getvalue of _io.StringIO object at 0x0000016791682DD0> = <_io.StringIO object at 0x0000016791682DD0>.getvalue

cupy_tests\core_tests\test_raw.py:1105: AssertionError

kmaehashi · 2026-02-28T02:30:30Z

Maybe we can implement NullCacheBackend and activate it in the context manager instead of mocking local cache directory.

seberg · 2026-02-28T09:32:24Z

I think use_temporary_cache_dir could mock cupy.cuda.compiler._kernel_cache_backend to DiskKernelCacheBackend(tmp_dir).

A NullCacheBackend is neater, but maybe it doesn't hurt to just use DiskKernelCacheBackend in a few extra tests anyway? (There is one check to see that the in_memory flag is honored, but that could easily be done with a NullCacheBackend also).

kmaehashi · 2026-03-01T02:19:06Z

/test windows,cuda130

leofang

Thanks, @kmaehashi! Looks like it's working now! I left a question but it is not blocking.

leofang · 2026-03-01T19:48:17Z

+    #    DownloadCache "${cache_pr_gcs_dir}" "${cache_archive}"
+    #}
+
+    $Env:CUPY_CI_ENABLE_GCP_KERNEL_CACHE = "1"


Since the old cache is now working (after #9728), I wonder if we should set this to 1 randomly (with 50-50 chance) so that we get both the old cache (which is exclusively used on the user land) and the new cache (only used in the CI) tested with equal chance.

@seberg @asi1024 thoughts?

set this to 1 randomly

This will double test runs needed to fully populate kernel cache. Even with GCP cache, we have to wait for three time for all tests to success. #9738 (comment)

My feeling is to look into adding some tests if we are worried, rather than keeping two cache mechanism in CI.

Sounds good. Let's track this in an issue to follow up. No need to block on the merge.

CI: Use GCP-backed kernel cache in Windows CI

kmaehashi requested a review from a team as a code owner February 21, 2026 10:19

kmaehashi added cat:test Test code / CI to-be-backported Pull-requests to be backported to stable branch prio:medium labels Feb 21, 2026

kmaehashi marked this pull request as draft February 21, 2026 10:20

kmaehashi mentioned this pull request Feb 21, 2026

Add pluggable cache backend abstraction for kernel compilation kmaehashi/cupy#84

Closed

kmaehashi mentioned this pull request Feb 23, 2026

CI: Use GCP-backed kernel cache in Linux CI #9739

Merged

Copilot AI and others added 3 commits February 24, 2026 18:01

introduce GCP-backed kernel cache

f1b4c4e

enable GCP kernel cache in Windows CI

7043722

install google-cloud-storage

c501c24

kmaehashi force-pushed the gcp-backed-cache branch from abeacb7 to c501c24 Compare February 24, 2026 09:01

kmaehashi marked this pull request as ready for review February 24, 2026 09:01

leofang self-assigned this Feb 27, 2026

null cache mock

4cf6195

leofang approved these changes Mar 1, 2026

View reviewed changes

leofang merged commit 8862fea into cupy:main Mar 3, 2026
70 checks passed

chainer-ci pushed a commit to chainer-ci/cupy that referenced this pull request Mar 3, 2026

Merge pull request cupy#9738 from kmaehashi/gcp-backed-cache

32f56b1

CI: Use GCP-backed kernel cache in Windows CI

chainer-ci mentioned this pull request Mar 3, 2026

[backport] CI: Use GCP-backed kernel cache in Windows CI #9761

Merged

leofang mentioned this pull request Mar 5, 2026

CI: Improve how kernel caches are updated/pruned #9665

Closed

leofang linked an issue Mar 5, 2026 that may be closed by this pull request

CI: Improve how kernel caches are updated/pruned #9665

Closed

leofang added this to the v15.0.0a1 milestone Mar 7, 2026

kmaehashi mentioned this pull request May 30, 2026

[v14] Bump version to v14.1.1 #9971

Merged

Uh oh!

Conversation

kmaehashi commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kmaehashi commented Feb 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

leofang commented Feb 22, 2026

Uh oh!

kmaehashi commented Feb 22, 2026

Uh oh!

kmaehashi commented Feb 22, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

kmaehashi commented Feb 22, 2026

Uh oh!

leofang commented Feb 22, 2026

Uh oh!

kmaehashi commented Feb 23, 2026

Uh oh!

kmaehashi commented Feb 23, 2026

Uh oh!

kmaehashi commented Feb 24, 2026

Uh oh!

kmaehashi commented Feb 26, 2026

Uh oh!

leofang commented Feb 27, 2026

Uh oh!

kmaehashi commented Feb 28, 2026

Uh oh!

seberg commented Feb 28, 2026

Uh oh!

kmaehashi commented Mar 1, 2026

Uh oh!

leofang left a comment

Choose a reason for hiding this comment

Uh oh!

leofang Mar 1, 2026

Choose a reason for hiding this comment

Uh oh!

leofang Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

kmaehashi Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

seberg Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

leofang Mar 3, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

kmaehashi commented Feb 21, 2026 •

edited

Loading

kmaehashi commented Feb 21, 2026 •

edited

Loading

kmaehashi commented Feb 22, 2026 •

edited

Loading