The WMT shared task on sign language translation follows the procedures established by the shared task on machine translation that has been running since 2007. Here, we are outlining the participation process in order to make it more clear for new participants.
Please note: The participation is about exchanging ideas, not about winning the competition! We encourage broad participation -- if you feel that your method is interesting but not state-of-the-art, then please participate in order to disseminate it and measure progress. The community may use your idea in a better manner and cite your paper, or even learn something useful from the negative results.
The participants download the training data and optionally the software of the baseline system. One can participate in any or all of the language directions.
The training data can be fed into the baseline system we provide in order to train a new system. The baseline system is a very basic system with a simple implementation of sequence-to-sequence methods for this task. The participants are allowed to modify the baseline system in order to improve the performance, or use software written by themselves.
As per WMT standards, it is possible that the participants use external data from other sources in order to improve the quality of their system. In this case though, the systems will be referred to as unconstrained and indicated as such in the final evaluation.
Note that basic linguistic tools such as taggers, parsers, or morphological analyzers are allowed in the constrained condition as well as pretrained language models released before February 2023. General tools for video processing such as pose extraction, image feature extraction or models such as CLIP are also allowed for a constrained submission.
During the system training step, the participants do not have access to the test data.
Step 2: Processing of the test data
The shared task organizers make the test sets available. The test set only contains the source side of every translation direction.
The participants use their systems (trained in the previous step) to translate the newly given test set.
The test set should be used only for decoding, i.e. using the existing trained system to produce translations. It is meant to be “blind” for the training process, i.e. it should by no means be used as part of the training data, or for other steps such as tuning, data augmentation etc.
Step 3: Submission of the system outputs
The participants submit their systems outputs to the WMT platform called OCELoT. By submitting system outputs to OCELoT, participants agree that their system outputs will be made publicly available later.
Before authors are able to submit the system outputs, they need to "sign up" or "register" a team in OCELoT. A participant can only have one team in OCELoT for making submissions. Each OCELoT team is verified by the organizers, which is a manual process that may take some time (please contact us in case of a delay or if you have questions). So participants are advised to do this as soon as they are sure they are participating in the shared task, and even before their system is ready.
Participants are allowed to upload up to seven submissions per language direction in total, but they have to indicate one of them as the primary submission. Since the resources for human evaluation are limited, the organizers will give priority to the primary submissions.
For the sign-to-text translation direction, OCELoT produces scores based on automatic metrics. (The platform does not display any automatic scores for the text-to-sign directions.) Importantly, the ranking of systems on OCELoT is based on automatic scores only and is not the final one. The final ranking of the shared task will appear in the findings paper and will be based on human evaluation.
After making a submission participants have to specify if a particular system is constrained or unconstrained.
Any technical updates related to OCELoT will be published under the Competition updates on the submission platform.
Furthermore, submissions should use the WMT XML format. We provide the test sources in this XML format, see the data tab. The exact submission format we require depends on the translation direction:
In this case system outputs are German text. We require that these outputs are combined with the XML file containing the test sources, see here for an example.
The outputs in the sign languages must be one mp4 video for each line of spoken language input text. Participants are free to submit any content they deem suitable as a translation. Examples: avatar animations, videos featuring photo-realistic signers or a pose estimation video.
Output videos should be wrapped in XML as well, but instead of text the <segment> elements in the XML should contain public links.
Important: we will attempt to retrieve public links right after the deadline. On this day videos must be available to us. If possible, please make sure that the links are downloadable directly via wget command or similar.