Allocate memory for image cache once.#1508
Conversation
…eck for nullptr to see if cache filled.
|
@veggiesaurus @kswang1029 Here's some profiling output (animating channels in a 20480 x 20480 x 10 image).
This branch: |
|
@confluence can you run with psrecord to make some pretty graphs? |
|
@veggiesaurus here you go. Note the scale of the memory usage -- because of the reallocation, we're actually frequently using twice the size of one channel, because the previous channel's memory hasn't been freed yet when memory is allocated for the new channel. Dev:
This branch:
|
There was a problem hiding this comment.
this appears fine. No significant performance improvements nor degradation is observed e2e (maybe due to the test image size I used). All e2e tests are passing.
before
[2025-09-22 07:43:47.219Z] [debug] [performance] Load 32768x32768 image to cache in 1879.371 ms at 571.330 MPix/s
after
[2025-09-22 07:45:29.273Z] [debug] [performance] Load 32768x32768 image to cache in 1544.169 ms at 695.353 MPix/s
@confluence were you testing with a large cube and switching channels one by one or using animation playback? I am trying to see if I can get a similar performance boost like you obtained (~0.5s per channel).
@confluence By comparing these two psrecord plots, I see the e2e time is similar (this PR performs slightly better then dev). But in the performance log you showed, the cache loading performance improves significantly (~40%). Does this imply something else slow it down (generating tiles?). Or we cannot compare the two psrecord plots directly? |
The e2e times of the two plots can't be compared directly -- this was a very unscientific manual test, not a script; I think that in the second plot I may have let it run through the animation for longer. You can get a better idea of the time difference if you compare the 10 tallest CPU spikes at the beginning of each plot -- that's where the 10 channels are being read from disk for the first time; the first one when the image is first loaded, and the other 9 when the animation starts (after that they're cached). You can see how these are more compressed in the second plot. The impact really becomes noticeable when a single channel is large, and especially during animation, when the channel is changed frequently. But there's a separate effect of the channels being loaded more quickly once they're cached (as you can see from the plots), so you need to let the animation loop at least once in both branches to compare like to like. (There may be other factors affecting the speed comparison in the tests; e.g. the hardware on which they're running, the filesystem, OS caching, etc.. My test was done on a Thinkpad with an SSD running Linux.) |
kswang1029
left a comment
There was a problem hiding this comment.
I tested with a 30000x30000x5 cube and now see the improvements 👍
- memory allocation issue is fixed
- performance improvements about 20% in animation playback (loop twice).
Performance data and graphs are shown below.
dev
[2025-09-23 03:13:03.779Z] [debug] [performance] Load 30000x30000 image to cache in 1591.193 ms at 565.613 MPix/s
[2025-09-23 03:13:09.122Z] [debug] [performance] Load 30000x30000 image to cache in 1472.268 ms at 611.302 MPix/s
[2025-09-23 03:13:09.394Z] [debug] [performance] Animator: Change frame in 1744.371 ms
[2025-09-23 03:13:11.128Z] [debug] [performance] Load 30000x30000 image to cache in 1529.537 ms at 588.413 MPix/s
[2025-09-23 03:13:11.443Z] [debug] [performance] Animator: Change frame in 1844.232 ms
[2025-09-23 03:13:13.558Z] [debug] [performance] Load 30000x30000 image to cache in 1910.128 ms at 471.173 MPix/s
[2025-09-23 03:13:13.895Z] [debug] [performance] Animator: Change frame in 2247.120 ms
[2025-09-23 03:13:15.754Z] [debug] [performance] Load 30000x30000 image to cache in 1654.072 ms at 544.112 MPix/s
[2025-09-23 03:13:16.070Z] [debug] [performance] Animator: Change frame in 1969.322 ms
[2025-09-23 03:13:17.926Z] [debug] [performance] Load 30000x30000 image to cache in 1650.881 ms at 545.163 MPix/s
[2025-09-23 03:13:18.021Z] [debug] [performance] Animator: Change frame in 1746.546 ms
[2025-09-23 03:13:19.867Z] [debug] [performance] Load 30000x30000 image to cache in 1640.883 ms at 548.485 MPix/s
[2025-09-23 03:13:19.963Z] [debug] [performance] Animator: Change frame in 1736.510 ms
[2025-09-23 03:13:21.804Z] [debug] [performance] Load 30000x30000 image to cache in 1636.437 ms at 549.975 MPix/s
[2025-09-23 03:13:21.898Z] [debug] [performance] Animator: Change frame in 1730.108 ms
[2025-09-23 03:13:23.746Z] [debug] [performance] Load 30000x30000 image to cache in 1643.188 ms at 547.716 MPix/s
[2025-09-23 03:13:23.836Z] [debug] [performance] Animator: Change frame in 1732.794 ms
[2025-09-23 03:13:25.670Z] [debug] [performance] Load 30000x30000 image to cache in 1632.831 ms at 551.190 MPix/s
[2025-09-23 03:13:25.762Z] [debug] [performance] Animator: Change frame in 1724.426 ms
[2025-09-23 03:13:27.618Z] [debug] [performance] Load 30000x30000 image to cache in 1651.432 ms at 544.982 MPix/s
[2025-09-23 03:13:27.709Z] [debug] [performance] Animator: Change frame in 1742.017 ms
[2025-09-23 03:13:29.596Z] [debug] [performance] Load 30000x30000 image to cache in 1683.511 ms at 534.597 MPix/s
[2025-09-23 03:13:31.110Z] [debug] [performance] Load 30000x30000 image to cache in 1514.208 ms at 594.370 MPix/s
This PR
[2025-09-23 03:21:42.623Z] [debug] [performance] Load 30000x30000 image to cache in 1268.583 ms at 709.453 MPix/s
[2025-09-23 03:21:46.745Z] [debug] [performance] Load 30000x30000 image to cache in 1216.876 ms at 739.599 MPix/s
[2025-09-23 03:21:47.045Z] [debug] [performance] Animator: Change frame in 1516.889 ms
[2025-09-23 03:21:48.462Z] [debug] [performance] Load 30000x30000 image to cache in 1216.676 ms at 739.720 MPix/s
[2025-09-23 03:21:48.771Z] [debug] [performance] Animator: Change frame in 1526.329 ms
[2025-09-23 03:21:50.235Z] [debug] [performance] Load 30000x30000 image to cache in 1263.076 ms at 712.546 MPix/s
[2025-09-23 03:21:50.533Z] [debug] [performance] Animator: Change frame in 1561.704 ms
[2025-09-23 03:21:52.028Z] [debug] [performance] Load 30000x30000 image to cache in 1289.590 ms at 697.896 MPix/s
[2025-09-23 03:21:52.350Z] [debug] [performance] Animator: Change frame in 1611.566 ms
[2025-09-23 03:21:53.865Z] [debug] [performance] Load 30000x30000 image to cache in 1314.666 ms at 684.585 MPix/s
[2025-09-23 03:21:53.954Z] [debug] [performance] Animator: Change frame in 1403.840 ms
[2025-09-23 03:21:55.472Z] [debug] [performance] Load 30000x30000 image to cache in 1313.366 ms at 685.262 MPix/s
[2025-09-23 03:21:55.564Z] [debug] [performance] Animator: Change frame in 1406.071 ms
[2025-09-23 03:21:57.073Z] [debug] [performance] Load 30000x30000 image to cache in 1304.348 ms at 690.000 MPix/s
[2025-09-23 03:21:57.172Z] [debug] [performance] Animator: Change frame in 1403.104 ms
[2025-09-23 03:21:58.657Z] [debug] [performance] Load 30000x30000 image to cache in 1280.149 ms at 703.043 MPix/s
[2025-09-23 03:21:58.754Z] [debug] [performance] Animator: Change frame in 1376.626 ms
[2025-09-23 03:22:00.270Z] [debug] [performance] Load 30000x30000 image to cache in 1311.007 ms at 686.495 MPix/s
[2025-09-23 03:22:00.367Z] [debug] [performance] Animator: Change frame in 1408.096 ms
[2025-09-23 03:22:01.873Z] [debug] [performance] Load 30000x30000 image to cache in 1305.621 ms at 689.327 MPix/s
[2025-09-23 03:22:01.955Z] [debug] [performance] Animator: Change frame in 1387.358 ms
[2025-09-23 03:22:03.483Z] [debug] [performance] Load 30000x30000 image to cache in 1322.910 ms at 680.318 MPix/s
[2025-09-23 03:22:04.577Z] [debug] [performance] Load 30000x30000 image to cache in 1094.437 ms at 822.341 MPix/s
|
@confluence based on the test results above, it seems when we trigger animation playback, the image data of the starting channel will be loaded into memory again and generate the tiles with the compression quality for animation. Could you confirm this behaviour? I think maybe we can bypass this loading step and proceed with tile generation directly. When we stop the animation, it seems also the stopping channel will be loaded again for generating tiles with the compression quality for images. Do you think it is also possible to bypass the loading step? These duplications becomes much noticeable when the per-channel image size is large enough. Do you think we need a separate issue to improve this part? |
|
@kswang1029 I'm not sure if that's really what's happening -- if yes, there may be a race condition in the cache filling code. The change in compression shouldn't have any effect on the cache; the same full resolution data for the channel is always loaded. In both graphs, you can see five taller CPU spikes in sequence at the start, which is consistent with the five channels being loaded for the first time -- if the first channel were being loaded again, I would expect to see an additional shorter spike between the first two. I'm going to add the channel index to the debugging output so that we can see what is actually going on. If this is a bug, I think we should handle it in a separate issue. |
@confluence ok I can redo the test once channel index is added to performance log. |
|
@kswang1029 OK, I've added the Z and polarization indices to the log. As far as I can see in my local tests, nothing weird is happening at the beginning (the first channel loaded after animation starts is index 1, as expected). What's happening at the end appears to be that during animation the data for the next channel is being requested in advance -- so when animation stops, the backend has to backtrack to reload the previous channel into the cache, because that's the "current" channel that the animation has actually stopped on. I believe that the pre-fetching is an intentional optimisation -- @veggiesaurus does that sound right, or is this actually a bug? |
Ok animation playback does pre-fetch 2 channels. So I guess that’s what I saw in the performance log. Will redo the test as a confirmation. |
|
@confluence thanks for adding the channel and stokes indices. The performance log is now sensible and all good. |
|
I've realised that this needs to account for the case of the cache always being loaded to generate contours in an HDF5 file. I guess that the cleanest and most future-proof way to do this would be to allocate the memory conditionally in the fill function, but if the contours are the only special case, perhaps there should be custom code there, to avoid a mostly unnecessary null pointer check every time we fill the cache. |
…. Fixed bad logic in cache validity checks.
|
I moved the allocation code back into |
I think this is fine. It's literally a single arithmetic operation 😆 |
|
* Allocate memory for image cache once. Image size should be size_t. Check for nullptr to see if cache filled. * added to changelog * Added channel and polarization indices to performance log for image cache fill * Moved cache allocation back to fill function, but made it conditional. Fixed bad logic in cache validity checks. * restored original constructor code
* Allocate memory for image cache once. (#1508) * Allocate memory for image cache once. Image size should be size_t. Check for nullptr to see if cache filled. * added to changelog * Added channel and polarization indices to performance log for image cache fill * Moved cache allocation back to fill function, but made it conditional. Fixed bad logic in cache validity checks. * restored original constructor code * attempt to remove image cache from core dump (#1506) * attempt to remove image cache from core dump * Check for platform compatibility before using madvise * fix included headers; formatting pass * Added error handling * working solution with aligned memory; needs tidying up * Encapsulated page alignment and madvise functionality in custom alias and function. Added changelog and docstrings. * Fixed docstring issues * fixed typo in docstring * Added a basic unit test. --------- Co-authored-by: Adrianna Pińska <[email protected]> --------- Co-authored-by: Angus Comrie <[email protected]>




Description
This fixes the unnecessary reallocation of the image cache memory every time the cache is filled. The cache code will be refactored soon, but there's no reason not to fix this in the meantime.
_image_cache_sizeshould never have been a signed long long. On 64-bit architecturessize_tshould be equivalent to unsigned long long. We pass this value to functions which take asize_tparameter.nullptrthan to check if the size is zero.Checklist