Skip to content

[python] Address memory leak in get_enumeration_values#4065

Closed
johnkerl wants to merge 1 commit intomainfrom
kerl/seal-gev
Closed

[python] Address memory leak in get_enumeration_values#4065
johnkerl wants to merge 1 commit intomainfrom
kerl/seal-gev

Conversation

@johnkerl
Copy link
Copy Markdown
Contributor

@johnkerl johnkerl commented May 19, 2025

Issue and/or context: https://linear.app/tiledb/issue/SOMA-130/c-memory-leak-in-get-enumeration-values

Changes:

Notes for Reviewer:

I ran this from @bkmartinjr on SOMA-130:

cat 130.py

import pandas as pd
import pyarrow as pa
import shutil

import tiledbsoma as soma


def main():
    uri = "foobar"
    data, expected_levels = [False, True, False, True, False], [False, True]
    n = len(data)
    domain = [[0, n - 1]]

    schema = pa.schema(
        [
            pa.field("bool_enum", pa.dictionary(pa.int8(), pa.bool_())),
        ]
    )

    shutil.rmtree(uri, ignore_errors=True)

    # Create the dataframe with no expected_levels for any enumerated column
    with soma.DataFrame.create(uri, schema=schema, domain=domain) as sdf:
        pass

    pd_data = {
        "soma_joinid": list(range(n)),
        "bool_enum": pd.Categorical(data),
    }
    arrow_data = pa.Table.from_pydict(pd_data)

    with soma.DataFrame.open(uri, "w") as sdf:
        sdf.write(arrow_data)

    for i in range(200):
        with soma.DataFrame.open(uri) as sdf:
            actual = sdf.get_enumeration_values(["bool_enum"])
            expect = {
                "bool_enum": pa.array(expected_levels, type=pa.bool_()),
            }
            assert actual == expect


main()
valgrind --error-limit=no --trace-children=yes -s --leak-check=full $(which python) 130.py

@codecov
Copy link
Copy Markdown

codecov Bot commented May 19, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.26%. Comparing base (f29839d) to head (3497c80).
Report is 2 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #4065      +/-   ##
==========================================
+ Coverage   88.97%   89.26%   +0.29%     
==========================================
  Files          59       59              
  Lines        7073     7073              
==========================================
+ Hits         6293     6314      +21     
+ Misses        780      759      -21     
Flag Coverage Δ
python 89.26% <ø> (+0.29%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
python_api 89.26% <ø> (+0.29%) ⬆️
libtiledbsoma ∅ <ø> (∅)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@johnkerl johnkerl changed the title [python] Address memory leak in get_enumeration_values [WIP] [python] Address memory leak in get_enumeration_values May 19, 2025
@johnkerl johnkerl marked this pull request as ready for review May 19, 2025 20:46
@bkmartinjr
Copy link
Copy Markdown
Member

It looks like there are other leaks in this call chain. To validate the fix, I ran the following:

valgrind --max-threads=4096 --leak-check=full pytest -s -x -v apis/python/tests/test_dataframe.py

Which produced the following leak inside get_enumeration_values

==687422== 510 bytes in 255 blocks are definitely lost in loss record 6,792 of 7,810
==687422==    at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==687422==    by 0x4A0E58E: strdup (strdup.c:42)
==687422==    by 0x873E0309: tiledbsoma::ArrowAdapter::arrow_schema_from_tiledb_attribute(tiledb::Attribute const&, tiledb::Context const&, tiledb::Array const&) (in /home/bruce/projects/TileDB-SOMA/dist/lib/libtiledbsoma.so)
==687422==    by 0x873579A2: tiledbsoma::SOMAAttribute::arrow_schema_slot(tiledbsoma::SOMAContext const&, tiledb::Array&) const (in /home/bruce/projects/TileDB-SOMA/dist/lib/libtiledbsoma.so)
==687422==    by 0x873243D2: tiledbsoma::SOMAArray::get_enumeration_values_for_column(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) (in /home/bruce/projects/TileDB-SOMA/dist/lib/libtiledbsoma.so)
==687422==    by 0x87326A3F: tiledbsoma::SOMAArray::get_enumeration_values(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >) (in /home/bruce/projects/TileDB-SOMA/dist/lib/libtiledbsoma.so)
==687422==    by 0x87005553: libtiledbsomacpp::load_soma_dataframe(pybind11::module_&)::{lambda(tiledbsoma::SOMADataFrame&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >)#2}::operator()(tiledbsoma::SOMADataFrame&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >) const [clone .constprop.0] (in /home/bruce/projects/TileDB-SOMA/apis/python/src/tiledbsoma/pytiledbsoma.cpython-312-x86_64-linux-gnu.so)
==687422==    by 0x8700BA18: pybind11::cpp_function::initialize<libtiledbsomacpp::load_soma_dataframe(pybind11::module_&)::{lambda(tiledbsoma::SOMADataFrame&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >)#2}, pybind11::dict, tiledbsoma::SOMADataFrame&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::arg>(libtiledbsomacpp::load_soma_dataframe(pybind11::module_&)::{lambda(tiledbsoma::SOMADataFrame&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >)#2}&&, pybind11::dict (*)(tiledbsoma::SOMADataFrame&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::arg const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) (in /home/bruce/projects/TileDB-SOMA/apis/python/src/tiledbsoma/pytiledbsoma.cpython-312-x86_64-linux-gnu.so)
==687422==    by 0x86D05663: pybind11::cpp_function::dispatcher(_object*, _object*, _object*) (in /home/bruce/projects/TileDB-SOMA/apis/python/src/tiledbsoma/pytiledbsoma.cpython-312-x86_64-linux-gnu.so)
==687422==    by 0x323D37: cfunction_call (methodobject.c:537)
==687422==    by 0x30726B: _PyObject_MakeTpCall (call.c:240)
==687422==    by 0x21AB4E: _PyEval_EvalFrameDefault.cold (bytecodes.c:2715)

@bkmartinjr
Copy link
Copy Markdown
Member

A similar valgrind on apis/python/tests/test_registration_mappings.py shows what is likely the same leak as above:

==705854== 340 bytes in 170 blocks are definitely lost in loss record 6,571 of 7,680
==705854==    at 0x4848899: malloc (in /usr/libexec/valgrind/vgpreload_memcheck-amd64-linux.so)
==705854==    by 0x4A0E58E: strdup (strdup.c:42)
==705854==    by 0x873E0309: tiledbsoma::ArrowAdapter::arrow_schema_from_tiledb_attribute(tiledb::Attribute const&, tiledb::Context const&, tiledb::Array const&) (in /home/bruce/projects/TileDB-SOMA/dist/lib/libtiledbsoma.so)
==705854==    by 0x873579A2: tiledbsoma::SOMAAttribute::arrow_schema_slot(tiledbsoma::SOMAContext const&, tiledb::Array&) const (in /home/bruce/projects/TileDB-SOMA/dist/lib/libtiledbsoma.so)
==705854==    by 0x873243D2: tiledbsoma::SOMAArray::get_enumeration_values_for_column(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) (in /home/bruce/projects/TileDB-SOMA/dist/lib/libtiledbsoma.so)
==705854==    by 0x87326A3F: tiledbsoma::SOMAArray::get_enumeration_values(std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >) (in /home/bruce/projects/TileDB-SOMA/dist/lib/libtiledbsoma.so)
==705854==    by 0x87005553: libtiledbsomacpp::load_soma_dataframe(pybind11::module_&)::{lambda(tiledbsoma::SOMADataFrame&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >)#2}::operator()(tiledbsoma::SOMADataFrame&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >) const [clone .constprop.0] (in /home/bruce/projects/TileDB-SOMA/apis/python/src/tiledbsoma/pytiledbsoma.cpython-312-x86_64-linux-gnu.so)
==705854==    by 0x8700BA18: pybind11::cpp_function::initialize<libtiledbsomacpp::load_soma_dataframe(pybind11::module_&)::{lambda(tiledbsoma::SOMADataFrame&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >)#2}, pybind11::dict, tiledbsoma::SOMADataFrame&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >, pybind11::name, pybind11::is_method, pybind11::sibling, pybind11::arg>(libtiledbsomacpp::load_soma_dataframe(pybind11::module_&)::{lambda(tiledbsoma::SOMADataFrame&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >)#2}&&, pybind11::dict (*)(tiledbsoma::SOMADataFrame&, std::vector<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::allocator<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > > >), pybind11::name const&, pybind11::is_method const&, pybind11::sibling const&, pybind11::arg const&)::{lambda(pybind11::detail::function_call&)#3}::_FUN(pybind11::detail::function_call&) (in /home/bruce/projects/TileDB-SOMA/apis/python/src/tiledbsoma/pytiledbsoma.cpython-312-x86_64-linux-gnu.so)
==705854==    by 0x86D05663: pybind11::cpp_function::dispatcher(_object*, _object*, _object*) (in /home/bruce/projects/TileDB-SOMA/apis/python/src/tiledbsoma/pytiledbsoma.cpython-312-x86_64-linux-gnu.so)
==705854==    by 0x323D37: cfunction_call (methodobject.c:537)
==705854==    by 0x30726B: _PyObject_MakeTpCall (call.c:240)
==705854==    by 0x21AB4E: _PyEval_EvalFrameDefault.cold (bytecodes.c:2715)

@johnkerl
Copy link
Copy Markdown
Contributor Author

Closing in favor of #4066

@johnkerl johnkerl closed this May 20, 2025
@johnkerl johnkerl deleted the kerl/seal-gev branch June 30, 2025 17:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants