GitHub - facebookresearch/ssvp_slt: Self-supervised video pretraining for sign language translation.

SSVP-SLT: Self-supervised Video Pretraining for Sign Language Translation

This repository contains research code for the paper Towards Privacy-Aware Sign Language Translation at Scale.

SSVP-SLT relies on masked autoencoding (MAE) on anonymized and unannotated videos as a form of self-supervised pretraining to learn continuous sign language representations at scale. The learned representations are transferred to the supervised gloss-free sign language translation task. SSVP-SLT outperforms prior SOTA methods on the ASL-to-English How2Sign benchmark in the finetuned and zero-shot settings by over 3 BLEU points.

Installation

We provide installation instructions in INSTALL.md.

Usage

1. Preparing the data

We describe how to prepare the datasets in DATASETS.md.

2. Pretraining

MAE pretraining instructions are in pretraining/README.md.
Joint MAE & CLIP/FLIP pretraining instructions are in pretraining_clip/README.md.

3. Sign Language Translation (SLT)

Instructions for feature extraction and SLT training and evaluation are in translation/README.md.

DailyMoth-70h

We release the DailyMoth-70h (DM-70) dataset as part of this project. DailyMoth-70h is released under a CC-BY-NC 4.0 license.

You can find an overview of the data and download and data preparation instructions in DATASETS.md.

Alternatively, download the files manually via these links:

Subset	Link	md5
Raw videos	download	`875ffe4eeac3a37e50b4202c2b4996d2`
Blurred clips	download	`a2819c7b06a8b38eb7686e4dc90a7433`
Unblurred clips	download	`3e69046f6cf415cec89c3544d0523325`
Manifest files	download	`69e500cc5cfad3133c4b589428865472`

Note

Check out our paper for detailed information on the DailyMoth-70h dataset.

Translation using SONAR

In order to try ASL translation using the massively multilingual and multimodal SONAR sentence embedding space, please see here.

Citing our work

If you find our work useful in your research, please consider citing:

@inproceedings{rust-etal-2024-towards,
    title = "Towards Privacy-Aware Sign Language Translation at Scale",
    author = "Rust, Phillip and Shi, Bowen and Wang, Skyler and Camgoz, Necati Cihan and Maillard, Jean",
    booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    year = "2024",
    address = "Bangkok, Thailand",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.acl-long.467",
    pages = "8624--8641",
}

References

This codebase is heavily influenced by the mae and mae_st repositories. Our models are based on code from Hiera, HF Transformers, OpenCLIP, and Fairseq.

License

This project is primarily under the CC-BY-NC 4.0 license; see LICENSE for details. Portions of the project are available under separate license terms: Transformers is licensed under the Apache-2.0 license and OpenCLIP is licensed under the OpenCLIP license.

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
.github		.github
configs		configs
docs		docs
examples/sonar		examples/sonar
fairseq-sl		fairseq-sl
pretraining		pretraining
pretraining_clip		pretraining_clip
scripts		scripts
src/ssvp_slt		src/ssvp_slt
tests		tests
translation		translation
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
DATASETS.md		DATASETS.md
INSTALL.md		INSTALL.md
LICENSE		LICENSE
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SSVP-SLT: Self-supervised Video Pretraining for Sign Language Translation

Installation

Usage

1. Preparing the data

2. Pretraining

3. Sign Language Translation (SLT)

DailyMoth-70h

Translation using SONAR

Citing our work

References

License

About

Releases

Packages

Contributors 8

Languages

License

facebookresearch/ssvp_slt

Folders and files

Latest commit

History

Repository files navigation

SSVP-SLT: Self-supervised Video Pretraining for Sign Language Translation

Installation

Usage

1. Preparing the data

2. Pretraining

3. Sign Language Translation (SLT)

DailyMoth-70h

Translation using SONAR

Citing our work

References

License

About

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Contributors 8

Languages

Packages