Skip to content

Make task splitting in TTreeProcessorMT more sensible with many input files.#7106

Merged
eguiraud merged 5 commits intoroot-project:masterfrom
bendavid:improvetasksplitting
Feb 19, 2021
Merged

Make task splitting in TTreeProcessorMT more sensible with many input files.#7106
eguiraud merged 5 commits intoroot-project:masterfrom
bendavid:improvetasksplitting

Conversation

@bendavid
Copy link
Copy Markdown
Contributor

Previously if running with a large number of input files (few hundred or even a thousand+ could be typical for large datasets processed into flat trees with grid jobs) one could end up with a huge number of tasks (at minimum nThreads * nFiles).

This makes the splitting aware of the number of files, taken into account for both the default splitting and the function used for user configuration, which also changes name.

@bendavid bendavid requested a review from pcanal as a code owner January 27, 2021 17:18
@phsft-bot
Copy link
Copy Markdown

Can one of the admins verify this patch?

@bendavid
Copy link
Copy Markdown
Contributor Author

Let me know what the preferred way would be to add a deprecation warning if a user calls SetMaxTasksPerFilePerWorker, since they should now call SetTasksPerWorker instead.

@oshadura
Copy link
Copy Markdown
Collaborator

@phsft-bot build!

@phsft-bot
Copy link
Copy Markdown

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-fedora30/cxx14, ROOT-fedora31/noimt, ROOT-ubuntu16/nortcxxmod, mac1014/python3, mac11.0/cxx17, windows10/cxx14
How to customize builds

@phsft-bot
Copy link
Copy Markdown

Build failed on ROOT-fedora30/cxx14.
Running on root-fedora30-1.cern.ch:/build/workspace/root-pullrequests-build
See console output.

Failing tests:

@phsft-bot
Copy link
Copy Markdown

Build failed on mac1014/python3.
Running on macphsft17.dyndns.cern.ch:/build/jenkins/workspace/root-pullrequests-build
See console output.

Failing tests:

@phsft-bot
Copy link
Copy Markdown

Build failed on ROOT-performance-centos8-multicore/default.
Running on null:/data/sftnight/workspace/root-pullrequests-build
See console output.

Failing tests:

@phsft-bot
Copy link
Copy Markdown

Build failed on ROOT-debian10-i386/cxx14.
Running on pcepsft10.dyndns.cern.ch:/build/workspace/root-pullrequests-build
See console output.

Failing tests:

@phsft-bot
Copy link
Copy Markdown

Build failed on ROOT-ubuntu16/nortcxxmod.
Running on sft-ubuntu-1604-3.cern.ch:/build/workspace/root-pullrequests-build
See console output.

Failing tests:

@phsft-bot
Copy link
Copy Markdown

Build failed on mac11.0/cxx17.
Running on macphsft23.dyndns.cern.ch:/Users/sftnight/build/workspace/root-pullrequests-build
See console output.

Failing tests:

@eguiraud eguiraud assigned eguiraud and unassigned pcanal Jan 29, 2021
@eguiraud
Copy link
Copy Markdown
Contributor

eguiraud commented Jan 29, 2021

Hi @bendavid , thanks a lot, I will take a look asap (I see that the test failures are desired -- I can take care of updating the corresponding roottest test).

@eguiraud eguiraud requested review from eguiraud and removed request for pcanal January 29, 2021 14:01
@phsft-bot
Copy link
Copy Markdown

Build failed on windows10/cxx14.
Running on null:C:\build\workspace\root-pullrequests-build
See console output.

Copy link
Copy Markdown
Contributor

@eguiraud eguiraud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Josh, looks good to me, and I can take care of deprecating fgMaxTasksPerFilePerWorker & friends. I don't 100% trust my Friday evening brain though, so I'll give it a second thought on Monday and merge then 😄

@eguiraud
Copy link
Copy Markdown
Contributor

eguiraud commented Feb 1, 2021

@bendavid do you have anything against renaming {S,G}etTaksPerWorker to {S,G}etMaxTasksPerWorker? So it's clear that it's not a precise setting.

@bendavid
Copy link
Copy Markdown
Contributor Author

bendavid commented Feb 1, 2021

No strong feelings, but the purpose of the name change was to reflect that it's not a hard upper ceiling either, just a target.

@eguiraud
Copy link
Copy Markdown
Contributor

eguiraud commented Feb 1, 2021

Let me see if I understand correctly: with these changes every file has a hard cap at ceil(tasksPerWorker * nWorkers / nFiles) tasks, so in total we'll have at most ceil(tasksPerWorker * nWorkers / nFiles) * nFiles tasks. So tasksPerWorker*nWorkers is a hard upper ceiling to the total number of tasks, modulo a potential (EDIT: maximum of) nFiles tasks more due to the effect of ceil. Was that what you were referring to, or am I missing something else?

@bendavid
Copy link
Copy Markdown
Contributor Author

bendavid commented Feb 1, 2021

Yes that's right.

@eguiraud
Copy link
Copy Markdown
Contributor

eguiraud commented Feb 1, 2021

Alright. Indeed the extreme case in which a user sets fgTasksPerWorker to 1 but anyway ends up with one task per file might be surprising if we call it SetMaxTasksPerWorker (EDIT: but also if we call it SetTasksPerWorker). 🤔 SetTasksPerWorkerHint? That should manage user expectations, and I can add a docstring explaining that we will do our best to honor that given internal constraints. It's verbose, but it's an expert interface, I'd rather be verbose than opaque.

@eguiraud
Copy link
Copy Markdown
Contributor

eguiraud commented Feb 8, 2021

@bendavid can you please switch the name to SetTasksPerWorkerHint and do a force push? Then I'll add the deprecation of the old method to this same PR -- I guess I will make SetMaxTasksPerFilePerWorker(n) print a deprecation warning and in the meanwhile I'll make it equivalent to SetTasksPerWorkerHint(n*nFiles) (not quite the same as before, but better than a no-op I guess).

@bendavid bendavid force-pushed the improvetasksplitting branch from b30fabe to 83e848b Compare February 16, 2021 18:57
@bendavid
Copy link
Copy Markdown
Contributor Author

Done.

@eguiraud
Copy link
Copy Markdown
Contributor

@phsft-bot build please

@phsft-bot
Copy link
Copy Markdown

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-fedora30/cxx14, ROOT-fedora31/noimt, ROOT-ubuntu16/nortcxxmod, mac1014/python3, mac11.0/cxx17, windows10/cxx14
How to customize builds

@phsft-bot
Copy link
Copy Markdown

Build failed on ROOT-performance-centos8-multicore/default.
Running on null:/data/sftnight/workspace/root-pullrequests-build
See console output.

Failing tests:

@phsft-bot
Copy link
Copy Markdown

Build failed on ROOT-fedora30/cxx14.
Running on root-fedora30-2.cern.ch:/home/sftnight/build/workspace/root-pullrequests-build
See console output.

Failing tests:

@phsft-bot
Copy link
Copy Markdown

Build failed on ROOT-ubuntu16/nortcxxmod.
Running on sft-ubuntu-1604-2.cern.ch:/build/workspace/root-pullrequests-build
See console output.

Failing tests:

@phsft-bot
Copy link
Copy Markdown

Build failed on mac11.0/cxx17.
Running on macphsft23.dyndns.cern.ch:/Users/sftnight/build/workspace/root-pullrequests-build
See console output.

Failing tests:

@phsft-bot
Copy link
Copy Markdown

Build failed on mac1014/python3.
Running on macitois21.dyndns.cern.ch:/Users/sftnight/build/workspace/root-pullrequests-build
See console output.

Failing tests:

@phsft-bot
Copy link
Copy Markdown

Build failed on ROOT-debian10-i386/cxx14.
Running on pcepsft10.dyndns.cern.ch:/build/workspace/root-pullrequests-build
See console output.

Failing tests:

@phsft-bot
Copy link
Copy Markdown

Build failed on windows10/cxx14.
Running on null:C:\build\workspace\root-pullrequests-build
See console output.

Errors:

  • [2021-02-17T22:47:14.892Z] CMake Error at C:/build/workspace/root-pullrequests-build/rootspi/jenkins/root-build.cmake:1070 (ctest_start):

Co-authored-by: Enric Tejedor Saavedra <[email protected]>
@eguiraud
Copy link
Copy Markdown
Contributor

@phsft-bot build please

@phsft-bot
Copy link
Copy Markdown

Starting build on ROOT-debian10-i386/cxx14, ROOT-performance-centos8-multicore/default, ROOT-fedora30/cxx14, ROOT-fedora31/noimt, ROOT-ubuntu16/nortcxxmod, mac1014/python3, mac11.0/cxx17, windows10/cxx14
How to customize builds

Copy link
Copy Markdown
Contributor

@eguiraud eguiraud left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Building one last time before merging, but I think this is good to go. root-project/roottest#660 updates roottest_root_multicore_tp_process_imt which fails with this patch.

@phsft-bot
Copy link
Copy Markdown

Build failed on ROOT-ubuntu16/nortcxxmod.
Running on sft-ubuntu-1604-4.cern.ch:/build/workspace/root-pullrequests-build
See console output.

Failing tests:

@phsft-bot
Copy link
Copy Markdown

Build failed on mac1014/python3.
Running on macphsft17.dyndns.cern.ch:/build/jenkins/workspace/root-pullrequests-build
See console output.

Failing tests:

@phsft-bot
Copy link
Copy Markdown

Build failed on ROOT-performance-centos8-multicore/default.
Running on null:/data/sftnight/workspace/root-pullrequests-build
See console output.

Failing tests:

@phsft-bot
Copy link
Copy Markdown

Build failed on ROOT-debian10-i386/cxx14.
Running on pcepsft10.dyndns.cern.ch:/build/workspace/root-pullrequests-build
See console output.

Failing tests:

@phsft-bot
Copy link
Copy Markdown

Build failed on mac11.0/cxx17.
Running on macphsft20.dyndns.cern.ch:/Users/sftnight/build/workspace/root-pullrequests-build
See console output.

Failing tests:

@phsft-bot
Copy link
Copy Markdown

Build failed on ROOT-fedora30/cxx14.
Running on root-fedora30-2.cern.ch:/home/sftnight/build/workspace/root-pullrequests-build
See console output.

Failing tests:

@phsft-bot
Copy link
Copy Markdown

Build failed on windows10/cxx14.
Running on null:C:\build\workspace\root-pullrequests-build
See console output.

@eguiraud eguiraud merged commit b7e8bb7 into root-project:master Feb 19, 2021
@eguiraud
Copy link
Copy Markdown
Contributor

Thank you very much @bendavid !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants