Skip to content

[c++] Implement experimental Memory Pool for ManagedQuery read operation#4299

Merged
XanthosXanthopoulos merged 13 commits intomainfrom
xan/SOMA-504
Nov 5, 2025
Merged

[c++] Implement experimental Memory Pool for ManagedQuery read operation#4299
XanthosXanthopoulos merged 13 commits intomainfrom
xan/SOMA-504

Conversation

@XanthosXanthopoulos
Copy link
Copy Markdown
Collaborator

Issue and/or context:

Changes:
This PR introduces a new experimental memory allocation mechanism for read operations. Instead of a per column memory budget, the new mechanism uses a global memory budget which is then split to each column based on the column byte size.

The new mechanism allows for more predictable memory usage which is independent of the underlying array schemas.

Additionally it changes how arrow arrays are constructed regardless of the memory allocation method. Each arrow table allocates its own buffers and copies data from the ManagedQuery read buffers which are reused for all incomplete reads. Additionally dictionary arrays are now allocated similarly to ordinary arrays and can be reused across different columns if the use the same Enumeration

Notes for Reviewer:
The new mechanism also changes the way memory is allocated (using resize instead of reserve to avoid UB) and requires lower limits. To enable enable the new mechanism you need to set soma.read.use_memory_pool to true and soma.read.memory_budget (defaults to 256MB)

@codecov
Copy link
Copy Markdown

codecov Bot commented Oct 29, 2025

Codecov Report

❌ Patch coverage is 7.44186% with 199 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.74%. Comparing base (ac5e0c5) to head (510955e).
⚠️ Report is 50 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4299      +/-   ##
==========================================
- Coverage   76.14%   75.74%   -0.41%     
==========================================
  Files         233      233              
  Lines       32591    32767     +176     
  Branches     1236     1259      +23     
==========================================
+ Hits        24818    24820       +2     
- Misses       7349     7520     +171     
- Partials      424      427       +3     
Flag Coverage Δ
libtiledbsoma 56.36% <7.44%> (-0.81%) ⬇️
python 89.45% <ø> (-0.03%) ⬇️
r 85.54% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
python_api 89.45% <ø> (-0.03%) ⬇️
libtiledbsoma 44.14% <7.44%> (-1.76%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jp-dark jp-dark changed the title [c++] Implement experimenta Memory Pool for ManagedQuery read operation [c++] Implement experimental Memory Pool for ManagedQuery read operation Oct 30, 2025
Copy link
Copy Markdown
Collaborator

@jp-dark jp-dark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this works like I think it will, it should be a big improvement, although I still need to dig a little deeper into some of the nuance. It also might be worth flipping it on by default and running the unit tests against it fully enabled.

Comment thread libtiledbsoma/src/utils/arrow_adapter.cc Outdated
Comment thread libtiledbsoma/src/utils/arrow_adapter.cc Outdated
Comment thread libtiledbsoma/src/soma/array_buffers.cc Outdated
Comment thread libtiledbsoma/src/soma/array_buffers.cc Outdated
Comment on lines +120 to +126
data_.reserve(num_bytes);
if (is_var_) {
offsets_.reserve(num_cells + 1); // extra offset for arrow
}
if (is_nullable_) {
validity_.reserve(num_cells);
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be best to address these before another release. If we don't fix them here, we should file a follow-up linear issue.

Comment thread libtiledbsoma/src/utils/arrow_adapter.h Outdated
Comment thread libtiledbsoma/src/soma/array_buffers.cc
Comment thread libtiledbsoma/src/soma/array_buffers.cc Outdated
Comment thread libtiledbsoma/src/soma/managed_query.cc
tiledb::impl::type_size(attr.type())) *
memory_budget_unit;

size_t num_cells = memory_budget_unit;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't this calculating num bytes, rather than num cells? (apologies if I'm misreading the code - but the units look like bytes)

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case memory_budget_unit is also the number of cells due to how memory per column is allocated because for any given column the allocated budget is memory_budget_unit * (element_size + is_var_size * 8 + is_nullable * 1)

return buffers_.at(names_.front())->size();
}

static bool use_memory_pool(const std::shared_ptr<tiledb::Array>& array);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add docstring

@bkmartinjr
Copy link
Copy Markdown
Member

Question: do all of the unit tests pass if soma.read.use_memory_pool is set to true?

Copy link
Copy Markdown
Member

@bkmartinjr bkmartinjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall this looks good (two very small nits noted inline). I'd like to confirm that our unit tests run with soma.read.use_memory_pool enabled.

Other than that, I'm happy with it landing under this feature flag. Lets coordinate on a run of the benchmark suite once it is in main.

Copy link
Copy Markdown
Member

@bkmartinjr bkmartinjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I belatedly noticed this is lacking an entry in HISTORY.md and the equivalent R file

@XanthosXanthopoulos XanthosXanthopoulos marked this pull request as ready for review November 4, 2025 20:47
@XanthosXanthopoulos XanthosXanthopoulos merged commit 109bba8 into main Nov 5, 2025
24 checks passed
@XanthosXanthopoulos XanthosXanthopoulos deleted the xan/SOMA-504 branch November 5, 2025 00:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants