DCHM: Depth-Consistent Human Modeling for Multiview Detection

Depth-Consistent Human Modeling (DCHM) framework enhances multiview pedestrian detection by achieving annotation-free 3D human modeling through superpixel-wise Gaussian Splatting, outperforming existing methods across challenging crowded scenarios. Note: The code is still under release.

Check our website for videos and reconstruction results!

TODO List

Release source code of the inference and training.
Release the spatially consistent pseudo-deph label.
Release the checkpoint of model.

Installation Guide

Follow these steps to set up the DCHM codebase on your system.

1. Clone this repository

git clone https://github.com/Jiahao-Ma/DCHM-code
cd DCHM-code

2. Create conda environment

conda create -n DCHM python=3.10
pip3 install torch torchvision torchaudio # use the correct version of cuda for your system

3. Install necessary libraries

Following the Grounding-SAM preparation process, create a submodules folder and clone the required libraries into it.

Grounding-SAM

# Clone the Grounded-SAM and install dependency
mkdir submodules && cd submodules
git clone https://github.com/IDEA-Research/Grounded-Segment-Anything.git
cd Grounded-Segment-Anything
python -m pip install -e segment_anything
pip install --no-build-isolation -e GroundingDINO
pip install --upgrade "diffusers[torch]"
git submodule update --init --recursive
cd grounded-sam-osx && bash install.sh
git clone https://github.com/xinyu1205/recognize-anything.git
pip install -r ./recognize-anything/requirements.txt
pip install -e ./recognize-anything/

# Download the pre-trained model
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_h_4b8939.pth
wget https://github.com/IDEA-Research/GroundingDINO/releases/download/v0.1.0-alpha/groundingdino_swint_ogc.pth

GSplat

We customize Gaussian Splatting (GSplat) for rendering, with the implementation available in submodules/customized_gsplat/rendering.py.
```
pip install gsplat
```

DepthAnything v2

git clone https://github.com/DepthAnything/Depth-Anything-V2

MvCHM

We evaluate our consistent representation in the supervised multi-view detection using MvCHM.

git clone https://github.com/Jiahao-Ma/MvCHM.git

4. Download the checkpoints and our pseudo-depth label

# The code and data are coming soon ...

Inference

Supervised-version

python supervised_cluster.py \
    --root, 'path\to\wildtrack_data_gt', \
    --round, 2_1, \
    --n-segments, 30, \
    --start-with, 0, \
    --end-with, -1, \
    --fun_type 'Inference'

Unsupervised-version

python unsupervised_cluster.py \
    --root, 'path\to\wildtrack_data_gt', \
    --round, 2_1, \
    --n-segments, 30, \
    --start-with, 0, \
    --end-with, 10, \
    --min_gs_threshold, 10

Training - Stage1. Generating Pseudo-depth Label

First Round Training

We provide the initial training pipeline as follows. For the complete iterative process, please refer to pipeline.sh.

Alternative Option

If you prefer to skip the training process for generating consistent pseudo-depth labels:
⬇️ Download pre-generated labels (Coming soon)

Note: The pseudo-depth labels are currently in preparation and will be available shortly. We recommend checking back later or following our release updates.

Pipeline Implementation

1. Preprocessing

# Step 1: Extract video frames
DATA_DIR=/path/to/wildtrack_data_gt
python s01_data_download.py \
    --data-dir ${DATA_DIR} \
    --duration 35 \
    --fps 2 \
    --output-folder Image_subsets

2. Foreground/Background Segmentation

# Step2: segment the foreground and background using grounded-sam
mkdir submodules && cd submodules
git clone https://github.com/IDEA-Research/Grounded-Segment-Anything.git
cd Grounded-Segment-Anything
# Step2.1: Generate the foreground (human) mask
TARGET1="people"
python s02_sam2_wgt.py \
        --root ${DATA_DIR} \
        --config submodules/Grounded-Segment-Anything/GroundingDINO/groundingdino/config/GroundingDINO_SwinT_OGC.py \
        --grounded_checkpoint submodules/Grounded-Segment-Anything/groundingdino_swint_ogc.pth \
        --sam_checkpoint submodules/Grounded-Segment-Anything/sam_vit_h_4b8939.pth \
        --input_image None \
        --output_dir ${DATA_DIR}/masks/${TARGET1} \
        --box_threshold 0.3 \
        --text_threshold 0.25 \
        --text_prompt ${TARGET1} \
        --device "cuda:0"

python s02_sam2.py \
        --root ${DATA_DIR} \
        --config submodules/Grounded-Segment-Anything/GroundingDINO/groundingdino/config/GroundingDINO_SwinT_OGC.py \
        --grounded_checkpoint submodules/Grounded-Segment-Anything/groundingdino_swint_ogc.pth \
        --sam_checkpoint submodules/Grounded-Segment-Anything/sam_vit_h_4b8939.pth \
        --input_image None \
        --output_dir ${DATA_DIR}/masks/${TARGET1} \
        --box_threshold 0.3 \
        --text_threshold 0.25 \
        --text_prompt ${TARGET1} \
        --device "cuda:0"

# Step2.2: Generate the background (ground) mask
TARGET2="ground"
python s02_sam2.py \
        --root ${DATA_DIR} \
        --config submodules/Grounded-Segment-Anything/GroundingDINO/groundingdino/config/GroundingDINO_SwinT_OGC.py \
        --grounded_checkpoint submodules/Grounded-Segment-Anything/groundingdino_swint_ogc.pth \
        --sam_checkpoint submodules/Grounded-Segment-Anything/sam_vit_h_4b8939.pth \
        --input_image None \
        --output_dir ${DATA_DIR}/masks/${TARGET2} \
        --box_threshold 0.3 \
        --text_threshold 0.25 \
        --text_prompt ${TARGET2} \
        --device "cuda:0"

3. Gaussian Splatting Initialization

# Step3: Initialization for GS. Generate super-pixel for the foreground
python s03_super_pixel.py \
    --root ${DATA_DIR} \
    --n_segments "30, 60"

4. 1st Round - Generate Pseudo-Depth Label

# --- Iterative Matching Label --- #
# --- First Rround --- #
# Step4: per frame training for GS
python s04_gs_us.py --root ${DATA_DIR} --round 1_1 --n-segments 30 --start-with 1 --end-with -1

5. Fine Tuning Mono-Depth

# Step5: fine-tuning mono-depth estimation
python s05_depth_finetune.py \
        --epochs 200 \
        --encoder vits \
        --bs 1 \
        --lr 0.000005 \
        --save-path "output/1_1_w_dc_200" \
        --dataset wildtrack \
        --img-size 518 \
        --min-depth 0.001 \
        --max-depth 40 \
        --pretrained-from checkpoints/depth_anything_v2_vits.pth \
        --port 20596 \
        --data_root ${DATA_DIR} \
        --round 1_1

Note: Please refer to the pipeline.sh for the all iterative training process. The iterative training process design for achieving consistent pseudo-depth label and fining the mono-depth label.

Training - Stage2. Fine Tuning on Downstream Task

The fine-tuning is only for supervised method only.

python supervised_cluster.py \
    --root, 'path\to\wildtrack_data_gt', \
    --round, 2_1, \
    --n-segments, 30, \
    --start-with, 0, \
    --end-with, -1, \
    --fun_type 'Train'

Evaluate

Supervised evaluation

The trained checkpoint of supervised deocoder will be released soon.

python supervised_evaluate.py \
    --pr_dir_pred "output/exp_sup/pr_dir_pred.txt" \
    --pr_dir_gt "output/exp_sup/pr_dir_gt.txt" \
    --sup_decoder_checkpoint "path/to/checkpoint.pth"

Unsupervised evaluation

The trained GS and the depth prediction model checkpoints will be released soon.

python unsupervised_evaluate.py \
    --root 'path/to/wildtrack_data_gt' \
    --round 1_2 \
    --n-segments 30 \
    --start-with 0 \
    --end-with 10 \
    --min_gs_threshold 10

Citation

If you find our code or paper useful for your research, please consider citing:

@article{ma2025dchm,
    title={DCHM: Depth-Consistent Human Modeling for Multiview Detection},
    author={Ma, Jiahao and Wang, Tianyu and Liu, Miaomiao and Ahmedt-Aristizabal, David and Nguyen, Chuong},
    journal={arXiv preprint arXiv:2507.14505},
    year={2025}
}

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
dataset		dataset
depth_wt		depth_wt
submodules		submodules
utils		utils
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
ProteusLib.py		ProteusLib.py
README.md		README.md
depthfusion.py		depthfusion.py
label_sup.ipynb		label_sup.ipynb
matching_check.py		matching_check.py
pipeline.sh		pipeline.sh
s01_data_download.py		s01_data_download.py
s02_sam2.py		s02_sam2.py
s02_sam2_wgt.py		s02_sam2_wgt.py
s03_super_pixel.py		s03_super_pixel.py
s04_gs_depth_prior.py		s04_gs_depth_prior.py
s04_gs_depth_prior_bk.py		s04_gs_depth_prior_bk.py
s04_gs_optimize.py		s04_gs_optimize.py
s04_gs_us.py		s04_gs_us.py
s04_mono_depth_init.py		s04_mono_depth_init.py
s04_mono_depth_init_cuda.py		s04_mono_depth_init_cuda.py
s05_depth_finetune.py		s05_depth_finetune.py
s06_depth_based_gs_init.py		s06_depth_based_gs_init.py
supervised_cluster.py		supervised_cluster.py
supervised_evaluate.py		supervised_evaluate.py
unsupervised_cluster.py		unsupervised_cluster.py
unsupervised_evaluate.py		unsupervised_evaluate.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DCHM: Depth-Consistent Human Modeling for Multiview Detection

TODO List

Installation Guide

1. Clone this repository

2. Create conda environment

3. Install necessary libraries

4. Download the checkpoints and our pseudo-depth label

Inference

Supervised-version

Unsupervised-version

Training - Stage1. Generating Pseudo-depth Label

First Round Training

Alternative Option

Pipeline Implementation

1. Preprocessing

2. Foreground/Background Segmentation

3. Gaussian Splatting Initialization

4. 1st Round - Generate Pseudo-Depth Label

5. Fine Tuning Mono-Depth

Training - Stage2. Fine Tuning on Downstream Task

Evaluate

Supervised evaluation

Unsupervised evaluation

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

DCHM: Depth-Consistent Human Modeling for Multiview Detection

TODO List

Installation Guide

1. Clone this repository

2. Create conda environment

3. Install necessary libraries

4. Download the checkpoints and our pseudo-depth label

Inference

Supervised-version

Unsupervised-version

Training - Stage1. Generating Pseudo-depth Label

First Round Training

Alternative Option

Pipeline Implementation

1. Preprocessing

2. Foreground/Background Segmentation

3. Gaussian Splatting Initialization

4. 1st Round - Generate Pseudo-Depth Label

5. Fine Tuning Mono-Depth

Training - Stage2. Fine Tuning on Downstream Task

Evaluate

Supervised evaluation

Unsupervised evaluation

Citation

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages