Welcome to the Sparse GEMINI repo. This repo contains the official implementation of the sparse gemini algorithm as well as all the commands to reproduce the experiments.
If your use case for Sparse GEMINI is rather for small datasets, you may want to check the numpy implementation in GemClus that provides a ready-to-use API.
If you are only interested in using the sparse gemini model, you can simply download the *.py files in the sparse_gemini folder.
The packages required to run the model are described in requirements.txt. Once installed, the command
python sparse_gemini/main.py -h
will give details on all parameters that can be used to toy with the model.
The remainder of the folders are dedicated to reproducing the experiments of our article:
utilscontains additional scripts for dataset creation, result gathering.analysiscontains the necessary script to re-obtain the contents of the figures.configscontains a set of configuration files for each different experiment.
There are a couple scripts in the utils folder that are useful for the analysis or the snakemake pipeline during experiments.
compute_distancescreate_datasetextract_common_featuresextract_logreg_feature_historyextract_mnist_feature_importancefetch_mnistfetch_openmlmerge_clusteringmerge_clusterings_v2merge_selectionsretrieve_optimal_solutions
If you are interested in replicating some experiments, please refer to the main file How to redo all experiments.md which lists the step-by-step command lines for all experiments.