Make sure bulk reading works with prefetching by ktf · Pull Request #16640 · root-project/root

ktf · 2024-10-08T19:35:26Z

This is currently not the case when the branch has more than 1 basket.

This PR fixes #8962.

ktf · 2024-10-08T19:46:54Z

@pcanal, this makes things work for me, albeit I am not sure it's the correct thing to do. I think the real issue is https://github.com/root-project/root/blob/master/tree/tree/src/TBranch.cxx#L1413 but I couldn't quite understand why / how to fix it.

pcanal · 2024-10-08T20:47:10Z

@ktf (I can reproduce the problem now). Commit b2c6904 seems to have broken the expectation of only one basket in flight, I also wonder how it interact with the interface, in particular, it is clear that this PR removes the crashs but does it lead to correct data being read through the bulk interface?

github-actions · 2024-10-08T22:20:32Z

Test Results

17 files 17 suites 3d 10h 37m 15s ⏱️
2 714 tests 2 707 ✅ 0 💤 7 ❌
43 528 runs 43 521 ✅ 0 💤 7 ❌

For more details on these failures, see this check.

Results for commit 75a6cbf.

♻️ This comment has been updated with latest results.

tree/tree/src/TBranch.cxx

ktf · 2024-10-09T09:31:28Z

Some tests are failing with Error: Value cannot be null. (Parameter 'ContainerId') not sure if it is my fault...

ktf · 2024-10-09T11:49:08Z

@pcanal Indeed, this still breaks my (more elaborate) integration tests...

ktf · 2024-10-09T11:56:37Z

Nevermind, I forgot to rebuild something and this does indeed break ABI compatibility because of the newly introduced vector..

pcanal · 2024-10-09T12:33:24Z

@ktf Can you test the additional commit that I pushed here? This is another API breaking change (new function).

ktf · 2024-10-09T12:49:12Z

Thanks. I am doing so right now. I should have something in an hour or so.

tree/tree/src/TBranch.cxx

ktf · 2024-10-09T14:18:08Z

This seems to leak somewhere. I have:

where it used to be flat.

pcanal · 2024-10-09T14:54:11Z

This seems to leak somewhere.

Yep, I can reproduce it .... investigating.

pcanal · 2024-10-09T16:25:04Z

I pushed a fix to the memory leak.

pcanal · 2024-10-09T18:20:45Z

Unfortunately the Windows is related. I am checking.

ktf · 2024-10-09T19:48:24Z

On the other hand, it seems to have fixed the leak and memory usage is back to expected.

This is currently not the case when the branch has more than 1 basket.

To be used instead of TBuffer::SetBuffer when the incoming buffer is actually coming from a TBuffer[File]. This will reduce memory churn.

pcanal · 2024-10-10T23:08:48Z

@ktf @dpiparo The only problem left is a problem with the test itself. When prefetching is enabled requires more than 2Gb of memory and thus fails on 32 bit platforms. I.e @ktf you can use the PR as-is if need be, it shall be merge soon.

pcanal · 2024-10-10T23:22:57Z

Actually, this is fix but might still not be doing what is meant :( .... and I am not sure if you actually need to use this features.

What do you intend on gain by calling SetClusterPrefetch?

The feature enabled by SetClusterPrefetch is to load all the basket of the cluster in memory so that within a cluster you can have cheap random to the entries (instead of having to decompress again and again).

(At least) there is optimization in place that actually counter to this and need to be removed (the optimization avoids a memory copy by sending the uncompressed buffer back to the user as is ... but then it is no longer there.

And in essence the fix we have here is also incorrect :(. When the 'ClusterPrefetching' is on, we actually should always leave the basket as is in the list of basket for the next call to possibly use (at least until the end of the cluster).

So you could indeed proceed with using this as it function (return the right result) but does not yet implement the ClusterPrefetching optimization (i.e. does not do what it is supposed to do), so you could also just as well turn it off (temporarily).

ktf · 2024-10-11T08:01:23Z

I was enabling SetClusterPrefetch as part of the attempt to reduce read_calls when processing our AODs.

Indeed I now notice that it's enough to simply do:

// Was affected by https://github.com/root-project/root/issues/8962
// Re-enabling this seems to cut the number of IOPS in half
tree->SetCacheSize(25000000);
//tree->SetClusterPrefetch(true);
for (auto& reader : mBranchReaders) {
   tree->AddBranchToCache(reader->branch());
}
tree->StopCacheLearningPhase();

to obtain the same result, so I am fine to simply disable it for now. Do I understand correctly that I still need this patch, though, in case there is more than one basket?

pcanal · 2024-10-11T09:18:45Z

I was enabling SetClusterPrefetch as part of the attempt to reduce read_calls when processing our AODs.

Apriori it does not intend have an effect on that.

Indeed I now notice that it's enough to simply do:

What is the change (increase of the cache size or explicit cache learning or both)?

ktf · 2024-10-11T11:13:53Z

What is the change (increase of the cache size or explicit cache learning or both)?

Doing the caching at all. I thought prefetching was part of it, but apparently it is not.

pcanal · 2024-10-11T11:40:15Z

I thought prefetching was part of it, but apparently it is not.

The names are really confusing ... sorry.

SetCacheSize enables/extend the TTreeCache, which 'real' job is to actually prefetch (early grab from disk) the compressed data. This is the tuning that control the size of the read from disk
SetClusterPrefetch enables the early decompression of the baskets of the current cluster (whose compressed data is already in memory if used in conjunction with the TTreeCache). This affects performance only in conjunction with non-sequential use/load/read of the entries.
gEnv->SetValue("TFile.AsyncPrefetching", 1), in conjunction with a TFileCacheRead (for example the TTreeCache) will asynchronously grab early the compressed data of the 'next' cluster while the current cluster is being processed (i.e. is subject of GetEntry)

This later setting might be of interest in your case.

ktf · 2024-10-11T13:02:42Z

Ok, thank you for your explanations. I can confirm that 1 works already for me. I will try 3. 2 I probably do not need, actually.

Instead of relying on TBranch::fBasketSize use the actual compressed/ondisk size of the buffer to decide on the size allocated for the TBufferFile of the baskets. In particular in the example testBulkApi, there is a file where there is only one float branch with an initial basketsize of 320k. There is enough data stored to reach just passed 32MB (data happens to not be compressed) and thus OptimizeBaskets is called and settle on a basket size of 25MB. So the file is composed of 93 baskets of 320k for the first clusters and a single basket for the second cluster. In the previous code (before this commit), any basket created allocates a TBufferFile of size 'branch->GetBasketSize()'. So in the example above for the first few baskets, we allocated 25MB instead of the required 320k. This becomes a real issue if we also turn on `SetClusterPrefetch` which will load in memory all the baskets of a cluster and thus for our exmaple will allocate 25MB * 93, i.e. more than 2GB! (instead of the 29MB needed).

This is to now actually implement the purpose of `ClusterPrefetch` which is to keep the basket in memory to support random access within a cluster.

root-project#16640 (comment)

#16640 (comment)

ktf requested a review from pcanal as a code owner October 8, 2024 19:35

pcanal reviewed Oct 9, 2024

View reviewed changes

tree/tree/src/TBranch.cxx Outdated Show resolved Hide resolved

dpiparo assigned pcanal Oct 9, 2024

ktf force-pushed the fix-prefetch branch 2 times, most recently from 4d7941f to e8bf8d3 Compare October 9, 2024 06:58

pcanal requested a review from dpiparo as a code owner October 9, 2024 12:32

ktf commented Oct 9, 2024

View reviewed changes

tree/tree/src/TBranch.cxx Show resolved Hide resolved

ktf and others added 3 commits October 10, 2024 17:12

Make sure bulk reading works with prefetching

0eb15c2

This is currently not the case when the branch has more than 1 basket.

io: Add TBuffer::SwapBuffer.

effde88

To be used instead of TBuffer::SetBuffer when the incoming buffer is actually coming from a TBuffer[File]. This will reduce memory churn.

tree: Use TBuffer::SwapBuffer in Bulk implementation

6fedde5

pcanal added 5 commits October 11, 2024 10:31

tree: add prefetching to bulk api test

12690f6

tree: Don't leak swapped buffer

da1c46e

Tree: keep the pre-decompressed cluster in memory for Bulk I/O.

f4efd1f

This is to now actually implement the purpose of `ClusterPrefetch` which is to keep the basket in memory to support random access within a cluster.

testBulkApi: Reduce cluster size to be closer to requested basketsize

75a6cbf

pcanal force-pushed the fix-prefetch branch from 3a8a166 to 75a6cbf Compare October 11, 2024 15:34

pcanal closed this Oct 18, 2024

pcanal reopened this Oct 18, 2024

ferdymercury added a commit to ferdymercury/root that referenced this pull request Sep 22, 2025

[nfc][tree] Clarify prefetch behavior

e09dc56

root-project#16640 (comment)

ferdymercury mentioned this pull request Sep 22, 2025

[nfc][tree] Clarify prefetch behavior #19934

Merged

dpiparo pushed a commit that referenced this pull request Oct 5, 2025

[nfc][tree] Clarify prefetch behavior

c3f7019

#16640 (comment)

Conversation

ktf commented Oct 8, 2024

Uh oh!

ktf commented Oct 8, 2024

Uh oh!

pcanal commented Oct 8, 2024

Uh oh!

github-actions bot commented Oct 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Results

Uh oh!

Uh oh!

ktf commented Oct 9, 2024

Uh oh!

ktf commented Oct 9, 2024

Uh oh!

ktf commented Oct 9, 2024

Uh oh!

pcanal commented Oct 9, 2024

Uh oh!

ktf commented Oct 9, 2024

Uh oh!

Uh oh!

ktf commented Oct 9, 2024

Uh oh!

pcanal commented Oct 9, 2024

Uh oh!

pcanal commented Oct 9, 2024

Uh oh!

pcanal commented Oct 9, 2024

Uh oh!

ktf commented Oct 9, 2024

Uh oh!

pcanal commented Oct 10, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pcanal commented Oct 10, 2024

Uh oh!

ktf commented Oct 11, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pcanal commented Oct 11, 2024

Uh oh!

ktf commented Oct 11, 2024

Uh oh!

pcanal commented Oct 11, 2024

Uh oh!

ktf commented Oct 11, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions bot commented Oct 8, 2024 •

edited

Loading

pcanal commented Oct 10, 2024 •

edited

Loading

ktf commented Oct 11, 2024 •

edited

Loading