Skip to content

tuned-org-uk/genegraph-storage

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

genegraph-storage

A storage layer for graph-based vector databases. Currently implements only the Lance file format (parquet and other formats can be implemented using the StorageBackend trait).

Provided functionalities:

  • save_metadata, load_metadata: gives a simple wrapper for all the data in the directory
  • save_* dense matrices, sparse matrices, vectors, lambdas, indices
  • load_* dense matrices, sparse matrices, vectors, lambdas, indices
  • some other useful stuff

A storage layer for:

  • javelin-tui: a graph-based vector database Text-Interface and
  • arrowspace: the next iteration of vector search

Usage

cargo add genegraph_storage

Simple example:

use genegraph_storage::lance::LanceStorage;
use genegraph_storage::metadata::{FileInfo, GeneMetadata};
use genegraph_storage::traits::StorageBackend;
use smartcore::linalg::basic::matrix::DenseMatrix;

// instantiate a storage
let storage = LanceStorage::new(
    "/tmp".to_string_lossy().to_string(),
    "basic_test".to_string(),
);

// some 2D data
let dense: Vec<Vec<f64>> = vec![
    vec![0.1, 0.4, 0.5, 0.2, 0.9],
    vec![0.4, 0.5, 0.2, 0.9, 0.3],
    vec![0.03, 0.8, 0.56, 0.2, 0.9],
    vec![0.1, 0.4, 0.5, 0.34, 0.9],
    vec![0.05, 0.4, 0.2, 0.3, 0.7]
];

let (nitems, nfeatures) = dense.len(), dense[0].len(); 
let data =
    DenseMatrix::<f64>::from_iterator(
        dense.iter().flatten().map(|x| *x), nitems, nfeatures, 0);

// Save metadata FIRST to initialize the storage directory
let md = GeneMetadata::seed_metadata(name, nitems, nfeatures, &storage.clone())
    .await
    .unwrap()
    .with_dimensions(nitems, nfeatures);

// your data is saved in an efficient format
storage
    .save_dense("my_dataset", &data, &md_path)
    .await
    .unwrap();

// Loading back
let md_path = storage.save_metadata(&md).await.unwrap();
let loaded = storage.load_dense("my_dataset").await.unwrap();

Extending and traits

Every custom definition of a Lance database (store or manifold or data-cube) should implement the Metadata trait (or reuse GeneMetadata) and the StorageBackend trait like LanceStorageGraph does with GeneMetadata and LanceStorage.

Traits in traits module can also be reused to implement other formats. Other formats can use StorageBackend to implement similar child-traits alike to LanceStorage. Then if matched with a custom Metadata instance can make a database, so every database is simply a StorageBackend + Metadata.

Contributing

See .github/ directory.

About

A minimal storage layer for storing large matrices and embeddings

Topics

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages