Skip to content

[c++/python] Addition of SOMAContext to reduce thread count#2158

Merged
nguyenv merged 12 commits intomainfrom
viviannguyen/reduce-thread-count
Feb 23, 2024
Merged

[c++/python] Addition of SOMAContext to reduce thread count#2158
nguyenv merged 12 commits intomainfrom
viviannguyen/reduce-thread-count

Conversation

@nguyenv
Copy link
Copy Markdown
Contributor

@nguyenv nguyenv commented Feb 21, 2024

Issue and/or context:

#2149

Locally, the thread count with these changes is now:

tiledbsoma.__version__        1.5.0rc0.post208.dev4244460900
TileDB-Py tiledb.version()    (0, 25, 0)
TileDB core version           2.19.0
libtiledbsoma version()       libtiledb=2.21.0
python version                3.11.6.final.0
OS version                    Linux 4.19.128-microsoft-standard
None
Start                           : pid=31616, cpus=8 number of threads=16
After context                   : pid=31616, cpus=8 number of threads=37
After open dataframe            : pid=31616, cpus=8 number of threads=37
After read                      : pid=31616, cpus=8 number of threads=53
After close                     : pid=31616, cpus=8 number of threads=53
After GC                        : pid=31616, cpus=8 number of threads=53
After context                   : pid=31616, cpus=8 number of threads=53
After open dataframe            : pid=31616, cpus=8 number of threads=53
After read                      : pid=31616, cpus=8 number of threads=53
After close                     : pid=31616, cpus=8 number of threads=53
After GC                        : pid=31616, cpus=8 number of threads=53
After context                   : pid=31616, cpus=8 number of threads=53
After open dataframe            : pid=31616, cpus=8 number of threads=53
After read                      : pid=31616, cpus=8 number of threads=53
After close                     : pid=31616, cpus=8 number of threads=53
After GC                        : pid=31616, cpus=8 number of threads=53
After context                   : pid=31616, cpus=8 number of threads=53
After open dataframe            : pid=31616, cpus=8 number of threads=53
After read                      : pid=31616, cpus=8 number of threads=53
After close                     : pid=31616, cpus=8 number of threads=53
After GC                        : pid=31616, cpus=8 number of threads=53

With 1.7, after the DataFrame refactoring, the counts were

tiledbsoma.__version__        1.7.0
TileDB-Py tiledb.version()    (0, 25, 0)
TileDB core version           2.19.0
libtiledbsoma version()       libtiledb=2.19.1
python version                3.11.0.final.0
OS version                    Linux 4.19.128-microsoft-standard
None
Start                           : pid=712, cpus=8 number of threads=16
After context                   : pid=712, cpus=8 number of threads=16
After open dataframe            : pid=712, cpus=8 number of threads=58
After read                      : pid=712, cpus=8 number of threads=74
After close                     : pid=712, cpus=8 number of threads=74
After GC                        : pid=712, cpus=8 number of threads=58
After context                   : pid=712, cpus=8 number of threads=58
After open dataframe            : pid=712, cpus=8 number of threads=74
After read                      : pid=712, cpus=8 number of threads=74
After close                     : pid=712, cpus=8 number of threads=74
After GC                        : pid=712, cpus=8 number of threads=58
After context                   : pid=712, cpus=8 number of threads=58
After open dataframe            : pid=712, cpus=8 number of threads=74
After read                      : pid=712, cpus=8 number of threads=74
After close                     : pid=712, cpus=8 number of threads=74
After GC                        : pid=712, cpus=8 number of threads=58
After context                   : pid=712, cpus=8 number of threads=58
After open dataframe            : pid=712, cpus=8 number of threads=74
After read                      : pid=712, cpus=8 number of threads=74
After close                     : pid=712, cpus=8 number of threads=74
After GC                        : pid=712, cpus=8 number of threads=58

With 1.6, prior to the DataFrame refactor:

tiledbsoma.__version__        1.6.0
TileDB-Py tiledb.version()    (0, 24, 0)
TileDB core version           2.18.2
libtiledbsoma version()       libtiledb=2.18.2
python version                3.11.6.final.0
OS version                    Linux 4.19.128-microsoft-standard
None
Start                           : pid=13843, cpus=8 number of threads=16
After context                   : pid=13843, cpus=8 number of threads=16
After open dataframe            : pid=13843, cpus=8 number of threads=37
After read                      : pid=13843, cpus=8 number of threads=58
After close                     : pid=13843, cpus=8 number of threads=58
After GC                        : pid=13843, cpus=8 number of threads=58
After context                   : pid=13843, cpus=8 number of threads=58
After open dataframe            : pid=13843, cpus=8 number of threads=58
After read                      : pid=13843, cpus=8 number of threads=58
After close                     : pid=13843, cpus=8 number of threads=58
After GC                        : pid=13843, cpus=8 number of threads=58
After context                   : pid=13843, cpus=8 number of threads=58
After open dataframe            : pid=13843, cpus=8 number of threads=58
After read                      : pid=13843, cpus=8 number of threads=58
After close                     : pid=13843, cpus=8 number of threads=58
After GC                        : pid=13843, cpus=8 number of threads=58
After context                   : pid=13843, cpus=8 number of threads=58
After open dataframe            : pid=13843, cpus=8 number of threads=58
After read                      : pid=13843, cpus=8 number of threads=58
After close                     : pid=13843, cpus=8 number of threads=58
After GC                        : pid=13843, cpus=8 number of threads=58

Changes:

  • Introduce SOMAContext to replace tiledb::Context arguments. Ultimately it is just a very simple wrapper around tiledb::Context but will aid in transitioning us out of tiledb-py usage by slowly eliminating usage of tiledb.Ctx() in favor of SOMAContext
  • We can now pass SOMAContext between C++ and Python and avoid needing to create new Contexts unless necessary
  • Modify SOMACollection to directly inherit from SOMAGroup so that they may be dynamically casted in the SOMAObject::open factory

Notes for Reviewer:

  • Applying query conditions greatly increases the number of threads. Without applying the filter, the thread count remains at 37 threads even after the dataframe read. It is not really possible to work around until the parser is implemented in core

Comment thread apis/python/src/tiledbsoma/_tiledb_object.py
Comment thread apis/python/src/tiledbsoma/options/_soma_tiledb_context.py Outdated
@codecov
Copy link
Copy Markdown

codecov Bot commented Feb 21, 2024

Codecov Report

Merging #2158 (638c7f5) into main (6c737ae) will decrease coverage by 6.32%.
The diff coverage is 71.28%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2158      +/-   ##
==========================================
- Coverage   78.39%   72.08%   -6.32%     
==========================================
  Files         135      102      -33     
  Lines       10699     6879    -3820     
  Branches      211      215       +4     
==========================================
- Hits         8388     4959    -3429     
+ Misses       2209     1821     -388     
+ Partials      102       99       -3     
Flag Coverage Δ
libtiledbsoma 67.64% <71.28%> (+0.79%) ⬆️
python ?
r 74.68% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
python_api ∅ <ø> (∅)
libtiledbsoma 48.75% <51.51%> (+0.70%) ⬆️

@nguyenv nguyenv marked this pull request as ready for review February 21, 2024 19:47
johnkerl added a commit that referenced this pull request Feb 22, 2024
…2165)

* Sync `main` to `release-1.7` as much as possible

* add new files
@nguyenv nguyenv requested a review from bkmartinjr February 22, 2024 21:45
Comment thread apis/python/src/tiledbsoma/_tdb_handles.py Outdated
Comment thread apis/python/src/tiledbsoma/soma_collection.cc Outdated
Comment thread apis/python/src/tiledbsoma/soma_array.cc Outdated
Comment thread apis/python/src/tiledbsoma/soma_group.cc Outdated
Comment thread apis/python/src/tiledbsoma/options/_soma_tiledb_context.py Outdated
Comment thread apis/python/src/tiledbsoma/options/_soma_tiledb_context.py Outdated
Copy link
Copy Markdown
Member

@bkmartinjr bkmartinjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only substantive comment is related to a missing lock around context instantiation in _soma_tiledb_context.py. It looks like it needs a lock.

Copy link
Copy Markdown
Member

@bkmartinjr bkmartinjr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@nguyenv nguyenv force-pushed the viviannguyen/reduce-thread-count branch from 6e47da4 to 2421648 Compare February 23, 2024 14:45
Comment thread apis/python/src/tiledbsoma/options/_soma_tiledb_context.py
@nguyenv nguyenv merged commit 5b8daae into main Feb 23, 2024
@nguyenv nguyenv deleted the viviannguyen/reduce-thread-count branch February 23, 2024 17:01
@github-actions
Copy link
Copy Markdown

The backport to release-1.7 failed:

The process '/usr/bin/git' failed with exit code 1

To backport manually, run these commands in your terminal:

# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add .worktrees/backport-release-1.7 release-1.7
# Navigate to the new working tree
cd .worktrees/backport-release-1.7
# Create a new branch
git switch --create backport-2158-to-release-1.7
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick --mainline 1 5b8daaee382a1a75aa2233afb78db73d7b0944f5
# Push it to GitHub
git push --set-upstream origin backport-2158-to-release-1.7
# Go back to the original working tree
cd ../..
# Delete the working tree
git worktree remove .worktrees/backport-release-1.7

Then, create a pull request where the base branch is release-1.7 and the compare/head branch is backport-2158-to-release-1.7.

Comment thread apis/python/src/tiledbsoma/options/_soma_tiledb_context.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants