GeneSegNet: a deep learning framework for cell segmentation by integrating gene expression and imaging. Genome Biology
- Create conda environments, use:
conda create -n GeneSegNet python=3.8
conda activate GeneSegNet
- Install Pytorch (1.12.1 Version), use:
conda install pytorch==1.12.1 torchvision==0.13.1 torchaudio==0.12.1 cudatoolkit=11.3 -c pytorch
but the above command may not match your CUDA environment, please check the link: https://pytorch.org/get-started/previous-versions/#v1121 to find the proper command that satisfies your CUDA environment.
- Clone the repository, use:
git clone https://github.com/BoomStarcuc/GeneSegNet.git
- Install dependencies, use:
pip install -r requirement.txt
- Download the demo training datasets at GoogleDrive and unzip them to your project directory.
- Download GeneSegNet pre-trained model at GoogleDrive, and put it into your project directory.
Directory structure of initial input data. See hippocampus demo datasets at GoogleDrive.
your raw dataset
|-images
| |-image sample 1
| |-image sample 2
| |-...
|-labels
| |-label sample 1
| |-label sample 2
| |-...
|-spots
| |-spot sample 1
| |-spot sample 2
| |-...
After preprocessing, you will output a dataset without splitting into training, validation and testing, as follows:
your preprocessed dataset
|-sample 1
| |-HeatMaps
| | |-HeatMap
| | |-HeatMap_all
| |-images
| |-labels
| |-spots
|-sample 2
| |-HeatMaps
| | |-HeatMap
| | |-HeatMap_all
| |-images
| |-labels
| |-spots
|-...
Please see preprocessed hippocampus demo datasets at GoogleDrive.
If you use the demo training dataset we provided, you can skip this section. But if you want to train on your own dataset, you first need to run the preprocessing code in preprocess directory to satisfy the dataset structure during training.
python Generate_Image_Label_locationMap.py
Note: base_dir and save_crop_dir need to be modified to your corresponding path.
You will need to split the output of the preprocessing step into training, validation, and test sets in reasonable proportions. The structure of the dataset should be as follows:
your split dataset
|-train
| |-sample 1
| | |-HeatMaps
| | | |-HeatMap
| | | |-HeatMap_all
| | |-images
| | |-labels
| | |-spots
| |-sample 2
| |-...
|-val
| |-sample 3
| | |-HeatMaps
| | | |-HeatMap
| | | |-HeatMap_all
| | |-images
| | |-labels
| | |-spots
| |-sample 4
| |-...
|-test
| |-sample 5
| | |-HeatMaps
| | | |-HeatMap
| | | |-HeatMap_all
| | |-images
| | |-labels
| | |-spots
| |-sample 6
| |-...
Please see the demo training dataset at GoogleDrive. Then you can start to train your model using command.
After training, the algorithm will save the trained model to your specified path.
To run the algorithm on your data, use:
python -u GeneSeg_train.py --use_gpu --train_dir training dataset path --val_dir validation dataset path --test_dir test dataset path --pretrained_model None --save_png --save_each --img_filter _image --mask_filter _label --all_channels --verbose --metrics --dir_above --save_model_dir save model path
Here:
use_gpuwill use GPU if torch with cuda installed.train_diris a folder containing training data to train on.val_diris a folder containing validation data to train on.test_diris a folder containing test data to validate training results.img_filter,mask_filter, andheatmap_filterare end strings for images, cell instance mask, and heat map.pretrained_modelis a model to use for running or starting training.chanis a parameter to change the number of channels as input (default 2 or 4).verboseshows information about running and settings and saves to log.save_eachsave the model under per n epoch for later comparison.save_pngsave masks as png and outlines as a text file for ImageJ.metricscompute the segmentation metrics.save_model_dirsave training model to a directory
To see the full list of command-line options run:
python GeneSeg_train.py --help
The input is your test dataset.
your test dataset
|-test
| |-sample 5
| | |-HeatMaps
| | | |-HeatMap
| | | |-HeatMap_all
| | |-images
| | |-labels
| | |-spots
| |-sample 6
| |-...
The output will include the following two images: 1) the predicted cell instance masks; 2) the cell boundary comparison plot between predicted results and training labels.
To run the test or a pre-trained model, use:
python GeneSeg_test.py --use_gpu --test_dir test dataset path --pretrained_model your trained model --save_png --img_filter _image --mask_filter _label --all_channels --metrics --dir_above --output_filename a folder name
Note: if you want to run a pre-trained model, you should download the pre-trained model provided first.
The input of the network inference is your raw datasets. See hippocampus demo datasets at GoogleDrive.
your raw dataset
|-images
| |-image sample 1
| |-image sample 2
| |-...
|-labels
| |-label sample 1
| |-label sample 2
| |-...
|-spots
| |-spot sample 1
| |-spot sample 2
| |-...
The output of the network inference includes four files of each sample as follows:
|-HeatMap
| |-sample 1
| |-sample 2
|- predicted full-resolution .mat file for sample 1
|- predicted full-resolution .png file for sample 1
|- predicted full-resolution .jpg file for sample 1
|- predicted full-resolution .mat file for sample 2
|- predicted full-resolution .png file for sample 2
|- predicted full-resolution .jpg file for sample 2
|-...
To obtain final full-resolution segmentation results, use slidingwindows_gradient.py in Inference directory:
python slidingwindows_gradient.py
Note: root_dir, save_dir, and model_file need to be modified to your corresponding path.
There are two types of input as follows:
1. your raw spot dataset
|-spots
| |-spot sample 1
| |-spot sample 2
| |-...
2. your output of the network inference
|-HeatMap
| |-sample 1
| |-sample 2
|- predicted full-resolution .mat file for sample 1
|- predicted full-resolution .png file for sample 1
|- predicted full-resolution .jpg file for sample 1
|- predicted full-resolution .mat file for sample 2
|- predicted full-resolution .png file for sample 2
|- predicted full-resolution .jpg file for sample 2
|-...
The output is a .csv file including four columns (cell_id, spotX, spotY, and gene) so that each gene will find its unique corresponding cell.
cell_id spotX spotY gene
| 0 213 419 Pvalb
| 0 248 442 Gad1
| 1 1212 18 Plp1
| . . . .
| . . . .
| . . . .
python generate_MappingRelationships.py
Note: spot_dir, label_dir, and save_dir need to be modified to your corresponding path.
If you find our work useful for your research, please consider citing the following paper.
@article{wang2023genesegnet,
title={GeneSegNet: a deep learning framework for cell segmentation by integrating gene expression and imaging},
author={Wang, Yuxing and Wang, Wenguan and Liu, Dongfang and Hou, Wenpin and Zhou, Tianfei and Ji, Zhicheng},
journal={Genome Biology},
volume={24},
number={1},
pages={235},
year={2023},
publisher={Springer}
}
