Baseline systems

We provide code to train baseline systems for DSGS to German in a public Github repository. The codebase contains scripts to preprocess data, train, translate and evaluate models. The underlying sequence-to-sequence toolkit is Sockeye which is based on Pytorch.

For questions or comments regarding the code please open issues on Github. Also, pull requests with contributions are very welcome.

Data set loaders

We added our training corpora to the sign_language_datasets library. The datasets can now be loaded as a Tensorflow data set. For example, provided that you obtained Zenodo access tokens:

import tensorflow_datasets as tfds

import sign_language_datasets.datasets

from sign_language_datasets.datasets.config import SignDatasetConfig

# Populate your access tokens


"zenodo_focusnews_token": "TODO",

"zenodo_srf_videos_token": "TODO",

"zenodo_srf_poses_token": "TODO"


# Load only the annotations, and include path to video files

config = SignDatasetConfig(name="annotations", version="1.0.0", process_video=False)

wmtslt = tfds.load(name='wmtslt', builder_kwargs={"config": config, **TOKENS})

Example: loading training data as a TFDS data set

See the README for further instructions and usage examples. For questions or comments regarding this loader please open issues on Github.