MT-Stretch

Published: 2 September 2019

Aims

To test whether the results of machine translation systems can be rapidly improved if neural machine translation models immediately reflect human corrections to their output in subsequent translations.

Background

Research into machine translation (MT) is one of the key elements of our Multilingual Solutions workstream. MT-Stretch is a research collaboration with the University of Edinburgh, closely aligned with the wider GoURMET project.

It aims to create algorithms which can identify and understand human-made edits to machine-translated text, which may then be fed back into the machine translation system. Corrections to mistranslations could be re-learnt by the system, increasing its accuracy.

News stories can be challenging for machine translations, with new names of people and places appearing regularly. It can be hard for MT models to analyse these correctly, yet for news reporting it is of the upmost importance that these details are translated accurately.

As part of the work conducted with MT-Stretch's sister project — GoURMET — we have experimented with building models that work with glossaries and terminology lists, which can be updated and expanded by users, as well as enabling models to learn through contextual data augmentation.

Originally, this project aimed to work closely with the BBC News bureau in Delhi, which creates content in Hindi, Gujarati, Marathi, Punjabi, Tamil and Telugu. However, due to the severe impact of the Covid-19 pandemic in 2020-2021, the aspirations for developing machine translation models between these languages — without going through an intermediate language (and thus additional cultural barriers) — had to be revised and curbed. These goals continue to offer a wide space for further research and experimentation.