Weakly-supervised Fingerspelling Recognition in British Sign Language Videos

This is the official implementation of the paper. The code has been tested with Python version 3.6.8. Pre-trained checkpoint for fingerspelling is also released below.

Environment & checkpoints

pip install -r requirements.txt
Download the pre-trained Transpeller checkpoint:
- cd data/
- wget https://www.robots.ox.ac.uk/~vgg/research/transpeller/transpeller.pth
Get the video features:
- Follow the instructions on the BOBSL page to get the username and password to access parts of the BOBSL dataset.
- cd features
- sh download_features.sh username password
Get the annotations:
- cd data/
- sh download.sh username password. This is a fast download that obtains the manually verified test annotations and the automatically obtained annotations for the BOBSL episodes. For the automatic annotations, the ? in the word column indicates the fingerspelled word could not be determined by the automatic pseudolabeling method.

Reproducing the scores on the test set

python test.py --ckpt_path data/transpeller.pth --builder localizer_ctc --test_csv data/fingerspelling-data-bmvc2022/transpeller-test.csv --feat_root features/video-swin-s_c8697_16f_bs32/

The above run should give a CER of 53.1. You can also turn on the --full_word_test flag to compute CER with the full words, which should be 59.9.

Using Video-Swin as a feature extractor

We also release the pre-trained Video-Swin model which is used to extract the features mentioned above. The model has been trained on person crops of the BOBSL dataset. The model will work best if the signer crops are similar to that of the BOBSL signer crops. You can get the pre-trained checkpoint here. Below is a small example of how to use it:

from videoswin import SwinTransformer3D, VideoPreprocessing
from utils import load

# BOBSL person-crop video input
model = SwinTransformer3D()
model = load("video-swin-s.pth")[0]

vp = VideoPreprocessing()

clip = # read an *RGB* video clip with size (batch_size, 3, 16, 256, 256). This can be done using OpenCV, for example. 

clip = vp(clip) # (batch_size, 3, 16, 224, 224)

features = model(clip) # (batch_size, 768)

License and Citation

The code, models, and the released annotations are bound by the exact same licensing terms stated on the official BOBSL page.

Please cite the following paper if you use this repository:

@InProceedings{Prajwal22a,
  author       = "K R Prajwal and Hannah Bull and Liliane Momeni and Samuel Albanie and G{\"u}l Varol and Andrew Zisserman",
  title        = "Weakly-supervised Fingerspelling Recognition in British Sign Language Videos",
  booktitle    = "British Machine Vision Conference",
  year         = "2022",
  keywords     = "sign language, fingerspelling, bsl, bobsl",
}

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
features		features
.gitignore		.gitignore
README.md		README.md
config.py		config.py
dataloader.py		dataloader.py
models.py		models.py
modules.py		modules.py
requirements.txt		requirements.txt
test.py		test.py
utils.py		utils.py
videoswin.py		videoswin.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Weakly-supervised Fingerspelling Recognition in British Sign Language Videos

Environment & checkpoints

Reproducing the scores on the test set

Using Video-Swin as a feature extractor

License and Citation

About

Uh oh!

Releases

Packages

Uh oh!

Languages

prajwalkr/transpeller

Folders and files

Latest commit

History

Repository files navigation

Weakly-supervised Fingerspelling Recognition in British Sign Language Videos

Environment & checkpoints

Reproducing the scores on the test set

Using Video-Swin as a feature extractor

License and Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages