All notable changes to the Python TileDB-SOMA project will be documented in this file (related: TileDB-SOMA R API changelog).
The format is based on Keep a Changelog.
- [#4299] Use a global memory budget for read operations instead of a per column memory budget. The global memory budget allocates splits the budget per column depending on the type and characteristics of each column. Global memory budget is disabled by default under a feature flag and can be enabled by setting
soma.read.use_memory_pool. - [#4363] Add new
SOMAContextclass that replaces theSOMATileDBContextclass. PlatformConfigandPlatformSchemaConfignow supportto_dict()and have a readable__repr__for diagnostic logging.
- [#4421] Max domain for integral dimensions are changed for int32, int64 and uint64 columns and are now
$(-2^{31} + 1, 2^{31} - 2)$ ,$(-2^{63} + 1, 2^{63} - 2)$ and$(0, 2^{63} - 2)$ respectively. - [#4299]
ManagedQueryreuses the same buffers for each incomplete read and allocates dedicated buffers when converting to Arrow. - [#4294] Use vcpkg instead of custom superbuild, for installing libtiledbsoma and its dependencies. Changes in CI workflows to use the prebuilt libtiledbsoma.
- [#4363] (BREAKING) The
contextproperty of aSOMAObjectnow returns aSOMAContextinstead of aSOMATileDBContext.
- [#4636] Deprecate
SOMATileDBContextclass in favor of the newSOMAContextclass.
- [#4431] Remove deprecated support for allowing a dimension in
shapeto beNonein theSparseNDArray. - [#4431] Remove deprecated support for setting
domain=Nonein thecreatemethod forDataFrame,PointCloudDataFrame, andGeometryDataFrame. - [#4448] Remove the deprecated
"resume"ingest mode fromtiledbsoma.io.from_h5ad,tiledbsoma.io.from_anndata,tiledbsoma.io.add_X_layer, andtiledbsoma.io.add_matrix_to_collection. The recommended approach for recovering from a failed ingestion is to delete the partially written SOMA Experiment and restart from the original input files or a known-good backup. - [#4448] Remove the deprecated behavior of deleting collection members in write mode (
mode="w"). Usemode="d"instead.
PlatformConfig.dense_nd_array_dim_zstd_levelwas incorrectly aliased tosparse_nd_array_dim_zstd_leveland is now bound to the correct field.
- [#4359] Fix unsafe casting of data on write when the input data type in a PyArrow table or batch does not match the existing schema.
- [#4324] Update TileDB core version to 2.30.0.
- [#4293] The
SOMAObjectclass is no longer a generic class over the handleWrapperinternal class. - [#4258] Add optional schema validation to
tiledbsoma.io.add_X_layer, which will generate an error if the provided matrix does not match the Experiment shape. Validation is enabled by default, for any dataset created with SOMA 1.15 or later.
Only R API updates in this release.
- [#4309] Update TileDB core to 2.29.2.
This release adds warnings for new deprecations in the allowed values for shape and domain in create methods and updates the TileDB core version to 2.29.1.
- [#4284] Update TileDB core to 2.29.1.
- [#4275] Deprecate allowing a dimension in
shapefor a newSOMASpaseNDArrayto be defaulted to1withNonein thecreatemethod. In the future, the shape must be a sequence of positive integers. - [#4275] Deprecate leaving
domain=Nonewhen creating aSOMADataFrame,SOMAPointCloudDataFrame, orSOMAGeometryDataFrame. In the future, the domain must be fully specified on creation.
This release is the first TileDB-SOMA release that follows our new versioning policy (see the developer docs). Highlights of this release include the addition of a "delete" feature, removal of deprecated function, an update to TileDB 2.28.1 and breaking changes to ExperimentAxisQuery.to_anndata.
- [#4125] Add delete mode specified by
mode='d'. - [#4205] Add
delete_cellsmethod toSparseNDArray,DataFrame, andPointCloudDataFrame. - [#4212] Add
typeread-only property toDenseNDArrayandSparseNDArray. - [#4215] Add
var_axis_deleteandobs_axis_deletetoExperiment.
- [#4126] [python] At package import time, validate that the expected TileDB version is installed and used. Raises a RuntimeError exception if the condition is not met. This is an attempt to better warn users who have corrupted conda installations.
- [#4137] [python][BREAKING] Add user-specified obs/var index column to
ExperimentAxisQuery.to_anndata. This change will also set the index columns based upon metadata hints, following the conventions oftiledbsoma.io. See the docstrings for more details. Prior to this change, the index dtype would be set tostringin all cases - this change removes the forced cast, and leaves the column type as-is. - [#4177] Update TileDB core to 2.28.1.
- Update TileDB version to #4177
- [#4209], [#4220] DataFrame columns of dictionary type, with a
large_stringorlarge_binaryvalue type, were incorrectly reported as an Arrowstring. They are now correctly reported as dictionary-typed fields with a value type oflarge_stringandlarge_binary, respectively. NB: all string/binary types are automatically up-cast to their large variant in tiledbsoma. - [#4250 ] Make X an optional argument for
ExperimentAxisQuery.to_anndatain parity withtiledbsoma.io.to_anndata.
- [#4125] Deprecate removing elements from a collection in write mode. In the future, all new removals will need to be done in delete mode.
- [#4245] Deprecate unused exception
NotCreateableError.
- [#4240] Remove deprecated function
tiledbsoma_build_index,io.append_obs,io.append_var,io.append_X, andio.create_from_matrix. Remove the deprecated methodconfig_options_from_schemafromSOMAArray.
- [#4139] [python] ExperimentAxisQuery.to_anndata would export obsm/varm as float32, regardless of the underlying SOMA data type. With this fix, the exported matrix will have the same data type as the original data.
- [#4147] [python] Fix a race condition in SOMA collection caching which would result in redundant object opens.
- [#4223] [python] Fix race condition in H5AD reading in the
tiledbsoma.iomodule.
The primary changes are modifications and deprecations to the tiledbsoma.io ingestion methods. Also introduced an option for writing pre-sorted data via TileDB global order writes.
-
[#3983] [python] Multiple writes of pre-sorted data may now be written to a single fragment using TileDB global order writes. Enable this performance optimization by setting the platform_config parameter
sort_coordstoFalsein the call to write. Will raise an error if data is not written in global sort order. -
[#4086] [python] Add new parameter
allow_duplicate_obs_idsto thetiledbsoma.iofunctionsregister_anndatasandregister_h5ads. WhenFalse(default), a error will be raised if there are any duplicateobsIDs in the provided SOMA Experiment or AnnData objects. Set the parameter toTruefor legacy behavior. ID handling on thevaraxis is unchanged. -
[#4108] [python] improve performance of
tiledbsoma.io.from_anndataandfrom_h5adwhen appending groups of AnnData known to have no duplicate obs axis IDs. -
[#4106] [python][BREAKING] The
SOMAObject.reopenmethod now modifies the orginalSOMAObjectin place (flushes data to disk and reopens with the requested timestamp and mode) and returns a reference to itself instead of flushing data to disk and opening a new object.
- [#4081] [python] the
tiledbsoma.iofunctionsappend_obs,append_varandappend_Xare deprecated and will be removed in a future release. It is recommended to use tiledbsoma.io.from_anndata (with a registration map from tiledbsoma.io.register_anndatas or tiledbsoma.io.register_h5ads) for appending new, complete AnnData objects to an Experiment. - [#4082] [python]
tiledbsoma.io.create_from_matrixis deprecated and will be removed in a future release. To add a new matrix as a layer within an existing SOMA Experiment (e.g., to X, obsm, varm), please use the more specific functions tiledbsoma.io.add_X_layer or tiledbsoma.io.add_matrix_to_collection. If you need to create a standalone SOMA NDArray outside of a pre-defined Experiment structure, please use the direct SOMA API constructors, such as tiledbsoma.SparseNDArray.create. - [#4083] [python] "resume" mode in tiledbsoma.io ingestion methods is deprecated and will be removed i a future release. This includes from_anndata, from_h5ad and related ingest functions. The recommended and safest approach for recovering from a failed ingestion is to delete the partially written SOMA Experiment and restart the ingestion process from the original input files or a known-good backup.
- [#4071] [python] A
tiledb_timestampwith value of zero is now equivalent to an unspecified timestamp (orNone), and will be a synonym for "current time". Prior to this fix, a zero-valued timestamp would generate errors or unpredictable results. - [#4103] [python] Do not attempt to resize empty measurements in tiledbsoma.io.prepare_experiment. This fixes a bug where calling
prepare_experimentwould fail if the Experiment contained any empty Measurements.
The primary change in 1.17.0 is the upgrade to TileDB 2.28.
- [#3740] [python] Add experimental Dask-backed
to_anndatafunctionality toSparseNDArrayandExperimentAxisQuery.
- [[#4057] [c++] Update TileDB core to 2.28.0
- [#4023] [c++] Use nanoarrow ArrowSchemaSetTypeDateTime for datetime values. Dictionary type with timestamp value type will raise error on read.
- [#4040] [python] suppress insignificant overflow warning from numpy.
- [#4050] DataFrame
countand SparseNDArraynnzfix - report correct number of cells in array in the case where a delete query had been previously applied. - [#4055] Various
open()code paths failed to check the SOMA encoding version number, and would fail with cryptic errors. - [#4066] Fix various memory leaks related to releasing Arrow structures when transfering ownership between C++ and Python and vise versa.
- [#4031] [python] Storage paths generated from collection keys are now URL-escaped if they contain characters outside the safe set (
a-zA-Z0-9-_.()^!@+={}~'). Additionally, the special names..and.are now prohibited.
TileDB-SOMA Python releases prior to 1.17.0 are documented in the TileDB-SOMA Github Releases.