[c++] Implement experimental Memory Pool for `ManagedQuery` read operation by XanthosXanthopoulos · Pull Request #4299 · single-cell-data/TileDB-SOMA

XanthosXanthopoulos · 2025-10-29T14:06:23Z

Issue and/or context:

Changes:
This PR introduces a new experimental memory allocation mechanism for read operations. Instead of a per column memory budget, the new mechanism uses a global memory budget which is then split to each column based on the column byte size.

The new mechanism allows for more predictable memory usage which is independent of the underlying array schemas.

Additionally it changes how arrow arrays are constructed regardless of the memory allocation method. Each arrow table allocates its own buffers and copies data from the ManagedQuery read buffers which are reused for all incomplete reads. Additionally dictionary arrays are now allocated similarly to ordinary arrays and can be reused across different columns if the use the same Enumeration

Notes for Reviewer:
The new mechanism also changes the way memory is allocated (using resize instead of reserve to avoid UB) and requires lower limits. To enable enable the new mechanism you need to set soma.read.use_memory_pool to true and soma.read.memory_budget (defaults to 256MB)

codecov · 2025-10-29T14:35:58Z

Codecov Report

❌ Patch coverage is 7.44186% with 199 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.74%. Comparing base (ac5e0c5) to head (510955e).
⚠️ Report is 50 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #4299      +/-   ##
==========================================
- Coverage   76.14%   75.74%   -0.41%     
==========================================
  Files         233      233              
  Lines       32591    32767     +176     
  Branches     1236     1259      +23     
==========================================
+ Hits        24818    24820       +2     
- Misses       7349     7520     +171     
- Partials      424      427       +3

Flag	Coverage Δ
libtiledbsoma	`56.36% <7.44%> (-0.81%)`	⬇️
python	`89.45% <ø> (-0.03%)`	⬇️
r	`85.54% <ø> (ø)`

Flags with carried forward coverage won't be shown. Click here to find out more.

Components	Coverage Δ
python_api	`89.45% <ø> (-0.03%)`	⬇️
libtiledbsoma	`44.14% <7.44%> (-1.76%)`	⬇️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

jp-dark

If this works like I think it will, it should be a big improvement, although I still need to dig a little deeper into some of the nuance. It also might be worth flipping it on by default and running the unit tests against it fully enabled.

jp-dark · 2025-10-30T21:10:48Z

+        data_.reserve(num_bytes);
+        if (is_var_) {
+            offsets_.reserve(num_cells + 1);  // extra offset for arrow
+        }
+        if (is_nullable_) {
+            validity_.reserve(num_cells);
+        }


It would be best to address these before another release. If we don't fix them here, we should file a follow-up linear issue.

bkmartinjr · 2025-11-03T17:20:49Z

+                                                            tiledb::impl::type_size(attr.type())) *
+                                   memory_budget_unit;
+
+            size_t num_cells = memory_budget_unit;


isn't this calculating num bytes, rather than num cells? (apologies if I'm misreading the code - but the units look like bytes)

In this case memory_budget_unit is also the number of cells due to how memory per column is allocated because for any given column the allocated budget is memory_budget_unit * (element_size + is_var_size * 8 + is_nullable * 1)

bkmartinjr · 2025-11-03T17:23:18Z

        return buffers_.at(names_.front())->size();
    }

+    static bool use_memory_pool(const std::shared_ptr<tiledb::Array>& array);


add docstring

bkmartinjr · 2025-11-03T17:30:55Z

Question: do all of the unit tests pass if soma.read.use_memory_pool is set to true?

bkmartinjr

Overall this looks good (two very small nits noted inline). I'd like to confirm that our unit tests run with soma.read.use_memory_pool enabled.

Other than that, I'm happy with it landing under this feature flag. Lets coordinate on a run of the benchmark suite once it is in main.

bkmartinjr

I belatedly noticed this is lacking an entry in HISTORY.md and the equivalent R file

…ers for reading data

…actor for variable size columns

XanthosXanthopoulos requested review from bkmartinjr and jp-dark October 29, 2025 14:06

jp-dark changed the title ~~[c++] Implement experimenta Memory Pool for ManagedQuery read operation~~ [c++] Implement experimental Memory Pool for ManagedQuery read operation Oct 30, 2025

jp-dark reviewed Oct 30, 2025

View reviewed changes

bkmartinjr reviewed Nov 3, 2025

View reviewed changes

XanthosXanthopoulos added 10 commits November 3, 2025 20:18

Add experimental memory pool for reads

9586ad2

Add optional flag to change buffer allocation method

ce05b64

Copy data to dedicated arrow buffers and reuse the managed query buff…

8cf0bf4

…ers for reading data

Fic compiler worning

b1c4c83

Lint fixes

6873154

Add some test with the experimental read enabled

b12d355

Lint fixes

665ac7c

Address review comments

e425e64

Include offset and validity buffers in memory budget, add expansion f…

1fa5be0

…actor for variable size columns

Add a name property for arrow buffers

ac15bc8

XanthosXanthopoulos force-pushed the xan/SOMA-504 branch from 58c0c09 to ac15bc8 Compare November 3, 2025 18:19

XanthosXanthopoulos added 3 commits November 4, 2025 19:19

Replace shared pointers with references

03e6493

Add missing docstring

0bd45a8

Update changelog

510955e

XanthosXanthopoulos marked this pull request as ready for review November 4, 2025 20:47

jp-dark approved these changes Nov 4, 2025

View reviewed changes

XanthosXanthopoulos merged commit 109bba8 into main Nov 5, 2025
24 checks passed

XanthosXanthopoulos deleted the xan/SOMA-504 branch November 5, 2025 00:52

XanthosXanthopoulos mentioned this pull request Dec 12, 2025

[c++] Provide different modes of handling memory when converting to Arrow objects #4334

Merged

Conversation

XanthosXanthopoulos commented Oct 29, 2025

Uh oh!

codecov Bot commented Oct 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

jp-dark left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jp-dark Oct 30, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

bkmartinjr Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

XanthosXanthopoulos Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

bkmartinjr Nov 3, 2025

Choose a reason for hiding this comment

Uh oh!

bkmartinjr commented Nov 3, 2025

Uh oh!

bkmartinjr left a comment

Choose a reason for hiding this comment

Uh oh!

bkmartinjr left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov Bot commented Oct 29, 2025 •

edited

Loading