Satoshi Tsutsui, Winnie Pang, Shuting He, and Bihan Wen, “WBCAtt+: Fine-Grained Pixel-Level Morphological Annotations for White Blood Cell Images,” Medical Image Analysis, 2026.
This repository provides the code to reproduce the experiments reported in the above paper. If you simply want to use a WBC segmentation model trained on our dataset, please check the following repository: https://github.com/apple2373/wbcsegmentor
- Download
pbcseg_final_v1.tarand put into data/PBC. - See
pbcseg_final_v1_viz.ipynbfor sample code to visualize segmentation masks. - Examples:

pbc_attr_v1_ccrop_train.csv,pbc_attr_v1_ccrop_val.csv, andpbc_attr_v1_ccrop_test.csvcontain attribute annotations for the train/val/test splits.cell_size, cell_shape, nucleus_shape, nuclear_cytoplasmic_ratio, chromatin_density, cytoplasm_vacuole, cytoplasm_texture, cytoplasm_colour, granule_type, granule_colour, granularity: The attribute columns.img_name: The image file name. It can serve as a unique identifier.path: The image file name. It can serve as a unique identifier.label: One of the five WBC types (neutrophils, eosinophils, basophils, monocytes, and lymphocytes) provided by the PBC dataset.
pbc_attr_v1_ccrop_all.csvhas a combined csv with thesplitcolumn to indicat the split.- Examples:

- python 3.9.18
- cuda 12.1
- pytorch 2.2.2
- mmsegmentation 1.2.1
- etc
See
environment.ymlfor details. Alternatively, use the following script.
conda create --name wbcattplus python=3.9.19
conda activate wbcattplus
pip install ipykernel matplotlib pandas tqdm pillow scikit-learn seaborn numpy scipy opencv-python ipywidgets gitpython nvitop gpustat scikit-image pyyaml
pip install torch==2.2.2 torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
pip install lightning==2.2.1
pip install -U openmim
mim install mmengine
mim install mmcv==2.1.0
pip install mmsegmentation==1.2.2
pip install ftfy regex torcheval kornia tensorboard 'jsonargparse[signatures]>=4.26.1' mmpretrain
The train_att.py script is used to train attribute prediction baselines, with the following usages:
python train_att.py -h # to see the available optionspython train_att.py --backbone resnet50 --use_eval_mode --batch_size 96 --agg nopython train_att.py --backbone convnext_tiny --batch_size 96 --agg nopython train_att.py --backbone vgg16 --batch_size 96 --agg nopython train_att.py --backbone vit_b_16 --batch_size 96 --agg no
The main_seg.py script is used to train semantic segmentation models with the following commands:
python main_seg.py fit --config ./configs/fcn.yamlpython main_seg.py fit --config ./configs/segformer.yamlpython main_seg.py fit --config ./configs/swin.yamlpython main_seg.py fit --config ./configs/convnext.yaml
We implement Mask2Former using detectron2 pipeline and use the official codebase with the SwinT backbone. You can play with it here.
To test the trained model, use the following command:
python main_seg.py test --ckpt_path <path-to-the-checkpoint> --config ./configs/fcn.yaml<path-to-the-checkpoint>should be something like./experiments/fcn/version_0/checkpoints/epoch=39-val_loss=0.44-val_miou=0.91.ckpt../configs/fcn.yamlshould be changed to the corresponding backbone configuration.
To predict the segmentation masks, use predict_seg.py with the following command:
python predict_seg.py --log_path <path-to-the-log-dir> --input_csv ./dataset_txt/pbc_attr_v1_ccrop_all.csv --image_col img_name --image_dir ./data/PBC/pbcseg_final_v1/ --save_dir results/seg-pred256x --nosave_prob<path-to-the-log-dir>should be something like./experiments/fcn/version_0/.
The train_att.py script can also be used to train models by changing the --agg arguments.
python train_att.py --backbone resnet50 --use_eval_mode --batch_size 96 --agg maxpython train_att.py --backbone convnext_tiny --batch_size 96 --agg sa_plain
The aggregation layer can be changed as follows:
--agg max: max pooling.--agg ave: average pooling.--agg concat: concatenation.--agg wave: weighted average pooling, where the weights are learnable.--agg sa_plain: scaled dot product attention (i.e., transformer-style attention).--agg sa_plain_ln: scaled dot product attention with layer normalization (LN). (LN was not used for the reported results due to its negligible effects.)
- Download the data
- Download the Dataset with the backbone.
- Place the files under
data/BM. - Data split files:
bmcfive_train.csvandbmcfive_test.csv.
- Run segmentation with domain adaptation
- Use the
main_adapt.py. - `main_adapt.py fit --config ./configs/bmcseg_adapt.yaml'
- Use the
- After obtaining the predicted segmentation maps, use
--agg max+for cell label prediction.
./results/tab_baseline_attpred.md: Baseline models../results/tab_segbased_attpred.md: Cell Structure-Aware Recognition Model using ground-truth segmentation maps../results/tab_predsegbased_attpred.md: Cell Structure-Aware Recognition Model using predicted segmentation maps.
If you find this code/data useful, please consider to cite:
- Satoshi Tsutsui, Winnie Pang, Shuting He, and Bihan Wen, “WBCAtt+: Fine-Grained Pixel-Level Morphological Annotations for White Blood Cell Images,” Medical Image Analysis, 2026.
- Arxiv: http://arxiv.org/abs/2605.19692
- Abstract: The microscopic examination of white blood cells (WBCs) plays a fundamental role in pathology and is essential for diagnosing blood disorders such as leukemia and anemia. To support further research on WBC images, multiple datasets have been proposed. However, they mainly annotate cell categories, and lack detailed morphological characteristics that pathologists use to explain their interpretations of cells. To address this gap, we introduce WBCAtt+, a novel dataset of WBC images densely annotated with 11 morphological attributes and five pixel-level cell components. With 113k image-level labels and 10k segmentation maps, WBCAtt+ is the first to provide comprehensive annotations for WBC images. Leveraging this dataset, we provide baseline models for attribute recognition and semantic segmentation. We also design an attribute recognition model to incorporate compositional structure of cells, further improving the recognition performance. Lastly, we showcase various applications enabled by our dataset, such as explainable AI models, including counterfactual example generation. The dataset and code are publicly available.
@article{tsutsui2026wbcattplus,
title={WBCAtt+: Fine-Grained Pixel-Level Morphological Annotations for White Blood Cell Images},
author={Tsutsui, Satoshi and Pang, Winnie and He, Shuting and Wen, Bihan},
journal={Medical Image Analysis},
year={2026}
}
- Satoshi Tsutsui, Winnie Pang, and Bihan Wen. WBCAtt: A White Blood Cell Dataset Annotated with Detailed Morphological Attributes. Advances in Neural Information Processing Systems (NeurIPS) 2023.
- Arxiv: https://arxiv.org/abs/2306.13531
- Abstract: The examination of blood samples at a microscopic level plays a fundamental role in clinical diagnostics, influencing a wide range of medical conditions. For instance, an in-depth study of White Blood Cells (WBCs), a crucial component of our blood, is essential for diagnosing blood-related diseases such as leukemia and anemia. While multiple datasets containing WBC images have been proposed, they mostly focus on cell categorization, often lacking the necessary morphological details to explain such categorizations, despite the importance of explainable artificial intelligence (XAI) in medical domains. This paper seeks to address this limitation by introducing comprehensive annotations for WBC images. Through collaboration with pathologists, a thorough literature review, and manual inspection of microscopic images, we have identified 11 morphological attributes associated with the cell and its components (nucleus, cytoplasm, and granules). We then annotated ten thousand WBC images with these attributes. Moreover, we conduct experiments to predict these attributes from images, providing insights beyond basic WBC classification. As the first public dataset to offer such extensive annotations, we also illustrate specific applications that can benefit from our attribute annotations. Overall, our dataset paves the way for interpreting WBC recognition models, further advancing XAI in the fields of pathology and hematology.
@inproceedings{tsutsui2023wbcatt,
title={WBCAtt: A White Blood Cell Dataset Annotated with Detailed Morphological Attributes},
author={Tsutsui, Satoshi and Pang, Winnie and Wen, Bihan},
booktitle={Advances in Neural Information Processing Systems (NeurIPS).},
year={2023}
}