This repository contains research code for the paper Towards Privacy-Aware Sign Language Translation at Scale.
SSVP-SLT relies on masked autoencoding (MAE) on anonymized and unannotated videos as a form of self-supervised pretraining to learn continuous sign language representations at scale. The learned representations are transferred to the supervised gloss-free sign language translation task. SSVP-SLT outperforms prior SOTA methods on the ASL-to-English How2Sign benchmark in the finetuned and zero-shot settings by over 3 BLEU points.
We provide installation instructions in INSTALL.md.
We describe how to prepare the datasets in DATASETS.md.
- MAE pretraining instructions are in pretraining/README.md.
- Joint MAE & CLIP/FLIP pretraining instructions are in pretraining_clip/README.md.
Instructions for feature extraction and SLT training and evaluation are in translation/README.md.
We release the DailyMoth-70h (DM-70) dataset as part of this project. DailyMoth-70h is released under a CC-BY-NC 4.0 license.
You can find an overview of the data and download and data preparation instructions in DATASETS.md.
Alternatively, download the files manually via these links:
Subset | Link | md5 |
---|---|---|
Raw videos | download | 875ffe4eeac3a37e50b4202c2b4996d2 |
Blurred clips | download | a2819c7b06a8b38eb7686e4dc90a7433 |
Unblurred clips | download | 3e69046f6cf415cec89c3544d0523325 |
Manifest files | download | 69e500cc5cfad3133c4b589428865472 |
Note
Check out our paper for detailed information on the DailyMoth-70h dataset.
In order to try ASL translation using the massively multilingual and multimodal SONAR sentence embedding space, please see here.
If you find our work useful in your research, please consider citing:
@inproceedings{rust-etal-2024-towards,
title = "Towards Privacy-Aware Sign Language Translation at Scale",
author = "Rust, Phillip and Shi, Bowen and Wang, Skyler and Camgoz, Necati Cihan and Maillard, Jean",
booktitle = "Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
year = "2024",
address = "Bangkok, Thailand",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2024.acl-long.467",
pages = "8624--8641",
}
This codebase is heavily influenced by the mae and mae_st repositories. Our models are based on code from Hiera, HF Transformers, OpenCLIP, and Fairseq.
This project is primarily under the CC-BY-NC 4.0 license; see LICENSE for details. Portions of the project are available under separate license terms: Transformers is licensed under the Apache-2.0 license and OpenCLIP is licensed under the OpenCLIP license.