feat(python): add context manager support for safe resource cleanup#412
Conversation
Signed-off-by: yogyam <[email protected]>
There was a problem hiding this comment.
Pull request overview
This pull request adds context manager support to the Python bindings for safe resource cleanup. The changes address issue #388 by introducing CactusModel and CactusIndex wrapper classes that implement the context manager protocol (__enter__/__exit__), ensuring that cactus_destroy and cactus_index_destroy are called automatically, even when exceptions occur.
Changes:
- Added
CactusModelandCactusIndexcontext manager classes with automatic cleanup and use-after-destroy protection - Updated
__init__.pywith lazy imports for the new classes - Modernized
example.pyto demonstrate the new context manager pattern - Added comprehensive unit tests with mocked FFI layer
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 11 comments.
| File | Description |
|---|---|
| python/src/cactus.py | Added CactusModel and CactusIndex context manager classes with enter/exit support, double-free protection, and use-after-destroy guards |
| python/src/init.py | Added lazy imports for CactusModel and CactusIndex classes |
| python/example.py | Updated to demonstrate context manager usage pattern with both model types |
| python/tests/test_context_manager.py | Added 8 unit tests covering normal exit, exception handling, double-destroy safety, and use-after-destroy error handling |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| """ | ||
|
|
||
| def __init__(self, model_path, corpus_dir=None, cache_index=False): | ||
| self._handle = cactus_init(model_path, corpus_dir, cache_index) |
There was a problem hiding this comment.
The init method should check if cactus_init returns None (which indicates initialization failure) and raise an appropriate exception. According to the cactus_init docstring at line 207, it returns None on failure. Without this check, the context manager will have a None handle and operations will fail with less informative error messages. Consider checking the handle and calling cactus_get_last_error() to provide a meaningful error message.
| self._handle = cactus_init(model_path, corpus_dir, cache_index) | |
| self._handle = cactus_init(model_path, corpus_dir, cache_index) | |
| if self._handle is None: | |
| # Retrieve a descriptive error message from the underlying library, if available. | |
| try: | |
| error_msg = cactus_get_last_error() | |
| except NameError: | |
| error_msg = None | |
| if error_msg: | |
| raise RuntimeError(f"Failed to initialize CactusModel: {error_msg}") | |
| raise RuntimeError("Failed to initialize CactusModel: cactus_init returned None") |
| """ | ||
|
|
||
| def __init__(self, index_dir, embedding_dim): | ||
| self._handle = cactus_index_init(index_dir, embedding_dim) |
There was a problem hiding this comment.
The init method should check if cactus_index_init returns None (which indicates initialization failure) and raise an appropriate exception. According to the cactus_index_init docstring at line 654, it returns None on failure. Without this check, the context manager will have a None handle and operations will fail with less informative error messages. Consider checking the handle and raising a RuntimeError with details from cactus_get_last_error().
| self._handle = cactus_index_init(index_dir, embedding_dim) | |
| self._handle = cactus_index_init(index_dir, embedding_dim) | |
| if self._handle is None: | |
| # Initialization failed; surface a clear error with details from the C library | |
| error_msg = cactus_get_last_error() | |
| if not error_msg: | |
| raise RuntimeError("Failed to initialize CactusIndex (no additional error information provided)") | |
| raise RuntimeError(f"Failed to initialize CactusIndex: {error_msg}") |
| self._require_handle() | ||
| return cactus_rag_query(self._handle, query, top_k) | ||
|
|
||
| def stream_transcribe_start(self, options=None, language="en"): |
There was a problem hiding this comment.
Consider adding docstrings to the wrapper methods (complete, transcribe, embed, etc.) that reference or summarize the documentation from their underlying functions. This would improve discoverability and make the API easier to use, especially since users may look up help on these methods directly rather than the underlying free functions.
| def stream_transcribe_start(self, options=None, language="en"): | |
| def stream_transcribe_start(self, options=None, language="en"): | |
| """ | |
| Start a streaming transcription session with the current model. | |
| This is a convenience wrapper around :func:`cactus_stream_transcribe_start`. | |
| Args: | |
| options: Optional transcription options dict, passed directly to | |
| ``cactus_stream_transcribe_start``. | |
| language (str): Language code (e.g. "en") for the audio being transcribed. | |
| Returns: | |
| The handle or session identifier returned by ``cactus_stream_transcribe_start``. | |
| """ |
| def add(self, ids, documents, embeddings, metadatas=None): | ||
| self._require_handle() | ||
| return cactus_index_add(self._handle, ids, documents, embeddings, metadatas) | ||
|
|
||
| def delete(self, ids): | ||
| self._require_handle() | ||
| return cactus_index_delete(self._handle, ids) | ||
|
|
||
| def query(self, embedding, top_k=5, options=None): | ||
| self._require_handle() | ||
| return cactus_index_query(self._handle, embedding, top_k, options) | ||
|
|
||
| def compact(self): |
There was a problem hiding this comment.
Consider adding docstrings to the wrapper methods (add, delete, query, compact) that reference or summarize the documentation from their underlying functions. This would improve discoverability and make the API easier to use, especially since users may look up help on these methods directly rather than the underlying free functions.
| def add(self, ids, documents, embeddings, metadatas=None): | |
| self._require_handle() | |
| return cactus_index_add(self._handle, ids, documents, embeddings, metadatas) | |
| def delete(self, ids): | |
| self._require_handle() | |
| return cactus_index_delete(self._handle, ids) | |
| def query(self, embedding, top_k=5, options=None): | |
| self._require_handle() | |
| return cactus_index_query(self._handle, embedding, top_k, options) | |
| def compact(self): | |
| def add(self, ids, documents, embeddings, metadatas=None): | |
| """Add vectors and associated data to the index. | |
| This is a convenience wrapper around :func:`cactus_index_add`. | |
| Args: | |
| ids: Iterable of application-defined IDs for each vector. | |
| documents: Iterable of raw text (or other payload) for each vector. | |
| embeddings: Iterable or array-like of embedding vectors. | |
| metadatas: Optional iterable of metadata objects aligned with ``ids``. | |
| Returns: | |
| The result returned by :func:`cactus_index_add`, typically a status | |
| or summary of the added entries. | |
| """ | |
| self._require_handle() | |
| return cactus_index_add(self._handle, ids, documents, embeddings, metadatas) | |
| def delete(self, ids): | |
| """Delete vectors from the index by ID. | |
| This is a convenience wrapper around :func:`cactus_index_delete`. | |
| Args: | |
| ids: Iterable of IDs to remove from the index. | |
| Returns: | |
| The result returned by :func:`cactus_index_delete`, typically a status | |
| or count of deleted items. | |
| """ | |
| self._require_handle() | |
| return cactus_index_delete(self._handle, ids) | |
| def query(self, embedding, top_k=5, options=None): | |
| """Query the index for the nearest neighbors of a vector. | |
| This is a convenience wrapper around :func:`cactus_index_query`. | |
| Args: | |
| embedding: The query embedding vector. | |
| top_k: Maximum number of nearest neighbors to return. | |
| options: Optional dict of backend-specific query options. | |
| Returns: | |
| The result returned by :func:`cactus_index_query`, typically a | |
| structure containing matched IDs, scores, and associated payloads. | |
| """ | |
| self._require_handle() | |
| return cactus_index_query(self._handle, embedding, top_k, options) | |
| def compact(self): | |
| """Compact the index, reclaiming space and optimizing layout. | |
| This is a convenience wrapper around :func:`cactus_index_compact`. | |
| Returns: | |
| The result returned by :func:`cactus_index_compact`, if any. | |
| """ |
| # cactus_index_destroy called automatically, even on errors | ||
| """ | ||
|
|
||
| def __init__(self, index_dir, embedding_dim): |
There was a problem hiding this comment.
Consider adding a docstring to the init method that documents the parameters (index_dir, embedding_dim) and their purpose. This would help users understand what arguments to pass when creating the context manager directly (outside of a with statement).
| def __init__(self, index_dir, embedding_dim): | |
| def __init__(self, index_dir, embedding_dim): | |
| """Initialize a vector index handle. | |
| Parameters: | |
| index_dir: Path to the directory where the index is stored or will be created. | |
| embedding_dim: Integer dimensionality of the embedding vectors stored in the index. | |
| """ |
python/tests/test_context_manager.py
Outdated
| class TestCactusModelContextManager(unittest.TestCase): | ||
|
|
||
| @patch.object(cactus, "cactus_destroy") | ||
| @patch.object(cactus, "cactus_init", return_value=MOCK_HANDLE) | ||
| def test_with_calls_destroy(self, mock_init, mock_destroy): | ||
| with cactus.CactusModel("weights/test") as model: | ||
| self.assertIsNotNone(model._handle) | ||
|
|
||
| mock_init.assert_called_once_with("weights/test", None, False) | ||
| mock_destroy.assert_called_once_with(MOCK_HANDLE) | ||
|
|
||
| @patch.object(cactus, "cactus_destroy") | ||
| @patch.object(cactus, "cactus_init", return_value=MOCK_HANDLE) | ||
| def test_with_calls_destroy_on_exception(self, mock_init, mock_destroy): | ||
| with self.assertRaises(ValueError): | ||
| with cactus.CactusModel("weights/test") as model: | ||
| raise ValueError("simulated error") | ||
|
|
||
| mock_destroy.assert_called_once_with(MOCK_HANDLE) | ||
|
|
||
| @patch.object(cactus, "cactus_destroy") | ||
| @patch.object(cactus, "cactus_init", return_value=MOCK_HANDLE) | ||
| def test_double_destroy_safe(self, mock_init, mock_destroy): | ||
| model = cactus.CactusModel("weights/test") | ||
| model.destroy() | ||
| model.destroy() | ||
|
|
||
| mock_destroy.assert_called_once_with(MOCK_HANDLE) | ||
|
|
||
| @patch.object(cactus, "cactus_destroy") | ||
| @patch.object(cactus, "cactus_init", return_value=MOCK_HANDLE) | ||
| def test_method_after_destroy_raises(self, mock_init, mock_destroy): | ||
| model = cactus.CactusModel("weights/test") | ||
| model.destroy() | ||
|
|
||
| with self.assertRaises(RuntimeError): | ||
| model.complete([]) | ||
|
|
||
| with self.assertRaises(RuntimeError): | ||
| model.embed("text") | ||
|
|
||
| with self.assertRaises(RuntimeError): | ||
| model.transcribe("audio.wav") | ||
|
|
There was a problem hiding this comment.
Consider adding test cases for initialization failure scenarios (when cactus_init returns None). This would help verify that the context manager properly handles and reports initialization errors, which is currently not tested.
python/tests/test_context_manager.py
Outdated
|
|
||
|
|
There was a problem hiding this comment.
Consider adding test cases for initialization failure scenarios (when cactus_index_init returns None). This would help verify that the context manager properly handles and reports initialization errors, which is currently not tested.
| @patch.object(cactus, "cactus_index_destroy") | |
| @patch.object(cactus, "cactus_index_init", return_value=None) | |
| def test_init_failure_raises_and_does_not_destroy(self, mock_init, mock_destroy): | |
| # When cactus_index_init fails (returns None), the context manager | |
| # should raise and never call cactus_index_destroy. | |
| with self.assertRaises(RuntimeError): | |
| with cactus.CactusIndex("/tmp/index", 384) as index: | |
| # Body should never execute if initialization fails. | |
| self.fail("Context body executed despite initialization failure") | |
| mock_init.assert_called_once_with("/tmp/index", 384) | |
| mock_destroy.assert_not_called() |
| # cactus_destroy called automatically, even on errors | ||
| """ | ||
|
|
||
| def __init__(self, model_path, corpus_dir=None, cache_index=False): |
There was a problem hiding this comment.
Consider adding a docstring to the init method that documents the parameters (model_path, corpus_dir, cache_index) and their purpose. This would help users understand what arguments to pass when creating the context manager directly (outside of a with statement).
| def __init__(self, model_path, corpus_dir=None, cache_index=False): | |
| def __init__(self, model_path, corpus_dir=None, cache_index=False): | |
| """ | |
| Initialize a CactusModel instance. | |
| Args: | |
| model_path: Path to the model weights or configuration used to initialize the Cactus model. | |
| corpus_dir: Optional directory containing corpus or index data for retrieval-augmented generation. | |
| cache_index: If True, cache the index in memory for faster subsequent access. | |
| """ |
| if self._handle is None: | ||
| raise RuntimeError("Model has been destroyed") | ||
|
|
||
| def destroy(self): |
There was a problem hiding this comment.
Consider adding a docstring to the destroy method explaining that it's safe to call multiple times and that it's automatically called when using the context manager. This would help users who might want to manually call destroy in certain scenarios.
| def destroy(self): | |
| def destroy(self): | |
| """Release the underlying model resources. | |
| This method is safe to call multiple times; after the first call it | |
| becomes a no-op. It is called automatically when using CactusModel | |
| as a context manager (i.e., at the end of a ``with`` block). | |
| """ |
| if self._handle is None: | ||
| raise RuntimeError("Index has been destroyed") | ||
|
|
||
| def destroy(self): |
There was a problem hiding this comment.
Consider adding a docstring to the destroy method explaining that it's safe to call multiple times and that it's automatically called when using the context manager. This would help users who might want to manually call destroy in certain scenarios.
| def destroy(self): | |
| def destroy(self): | |
| """Release the underlying index resources. | |
| This method is safe to call multiple times. After the first call, | |
| the internal handle is set to None and subsequent calls will be | |
| no-ops. | |
| When using CactusIndex as a context manager (with CactusIndex(...) as index:), | |
| this method is called automatically from __exit__, even if an exception | |
| is raised inside the with block. Manual calls to destroy are only | |
| needed when managing the lifecycle explicitly without a with statement. | |
| """ |
|
At a glance it looks like its going in a good direction. It does add quite a bit more lines of code, but I can see the intent for it. The backwards compat seems like it should be okay since it does not break the prev syntax style. Co-Pilot is asking to add very long and verbose docstrings. I think these are unreadable, imo. I think its better to find a good place to demonstrate how to migrate one's code from the old style to the new one that uses a more concise api. This is also something repo owner (@HenryNdubuaku) should have input on because it is pretty significant. If you can verify everything with a full build an a model that would be good. Golden, if you can prove your work with some kind of address sanitizer tool. Will depend on your platform. Redundancy checks like this will be helpful for future contribs as well. With that being said, checking for memory leaks using an address sanitizer tool might be trickier than the scope of this issue. Please feel free to investigate. |
Signed-off-by: yogyam <[email protected]>
Signed-off-by: yogyam <[email protected]>
…xample Signed-off-by: yogyam <[email protected]>
|
Thanks for the review. On my recent commits, I added a migration guide to python/README.md showing old vs new pattern. Restored example.py to its original state — instead of modifying it, I added a separate example_context_manager.py to keep things non-invasive. I ran the full build verification — cactus build --python + cactus download google/gemma-3-270m-it — and all integration tests pass: normal with-block, cleanup on exception, use-after-destroy protection, and double-destroy safety. The mock-based unit tests also pass. Happy for you or @HenryNdubuaku to run on your end as well. |
|
thanks for this @yogyam !
Thanks! |
…ager * Remedied PR bloat by moving standalone _lib wrapper functions directly into CactusModel/CactusIndex * Removed redundant example_context_manager.py and uses example.py instead * Updated README.md to reflect CactusModel as primary Python FFI API * Ensured zero impact on cli.py and core executable * Stripped docstrings to adhere strictly to contributing guidelines Signed-off-by: yogyam <[email protected]>
|
Thanks for the feedback! I deleted the extra example file and moved the code into the existing example.py, refactored the Python bindings to make the CactusModel context manager the default API, updated the docs to reflect this, and verified that these changes don't impact the CLI. Let me know what more needs to be done! |
Signed-off-by: HenryNdubuaku <[email protected]>
|
Thanks for this @yogyam merging now, i decided to retain the old design too, for backward compatibility, while this becomes the new way. |
…actus-compute#412) * feat(python): add context manager support for safe resource cleanup Signed-off-by: yogyam <[email protected]> * docs: add context manager migration guide to python README Signed-off-by: yogyam <[email protected]> * style: trim verbose comments to match project guidelines Signed-off-by: yogyam <[email protected]> * refactor: restore original example.py, add separate context manager example Signed-off-by: yogyam <[email protected]> * refactor(python): encapsulate C-bindings into CactusModel context manager * Remedied PR bloat by moving standalone _lib wrapper functions directly into CactusModel/CactusIndex * Removed redundant example_context_manager.py and uses example.py instead * Updated README.md to reflect CactusModel as primary Python FFI API * Ensured zero impact on cli.py and core executable * Stripped docstrings to adhere strictly to contributing guidelines Signed-off-by: yogyam <[email protected]> * retain old Signed-off-by: HenryNdubuaku <[email protected]> --------- Signed-off-by: yogyam <[email protected]> Signed-off-by: HenryNdubuaku <[email protected]> Co-authored-by: HenryNdubuaku <[email protected]>
…actus-compute#412) * feat(python): add context manager support for safe resource cleanup Signed-off-by: yogyam <[email protected]> * docs: add context manager migration guide to python README Signed-off-by: yogyam <[email protected]> * style: trim verbose comments to match project guidelines Signed-off-by: yogyam <[email protected]> * refactor: restore original example.py, add separate context manager example Signed-off-by: yogyam <[email protected]> * refactor(python): encapsulate C-bindings into CactusModel context manager * Remedied PR bloat by moving standalone _lib wrapper functions directly into CactusModel/CactusIndex * Removed redundant example_context_manager.py and uses example.py instead * Updated README.md to reflect CactusModel as primary Python FFI API * Ensured zero impact on cli.py and core executable * Stripped docstrings to adhere strictly to contributing guidelines Signed-off-by: yogyam <[email protected]> * retain old Signed-off-by: HenryNdubuaku <[email protected]> --------- Signed-off-by: yogyam <[email protected]> Signed-off-by: HenryNdubuaku <[email protected]> Co-authored-by: HenryNdubuaku <[email protected]>
What
Based of issue #388
Added
CactusModelandCactusIndexcontext manager classes to the Python bindings socactus_destroy/cactus_index_destroyare called automatically — even when exceptions occur.Both classes:
destroy()twice is safe)RuntimeErroron use-after-destroyFFI Audit
Reviewed
cactus_ffi.h.Three handle types allocate native memory:
cactus_model_tcactus_index_tcactus_stream_transcribe_tcactus_model_tandcactus_index_treceive wrappers.cactus_stream_transcribe_tis a sub-resource of a model (freed bycactus_stream_transcribe_stop), so no separate wrapper is needed.Files Changed
python/src/cactus.pyAdded
CactusModelandCactusIndexclasses (appended; no existing code modified)python/src/__init__.pyAdded lazy imports for new classes
python/example.pyUpdated to use the
withpatternpython/tests/test_context_manager.pyAdded 8 unit tests (mocked FFI; no model weights required)
Tests
$ python -m unittest python/tests/test_context_manager.py -v Ran 8 tests in 0.002s — OKTests verify:
destroyis called on normal exitdestroyis called on exceptionManual
leaks --atExittesting requires a full build + model weights. Happy to add a demo script if needed.Backward Compatibility
Fully backward-compatible.
All existing free functions remain unchanged. The new classes are purely additive.