GoURMET - Global Under-Resourced MEdia Translation

Creating machine translation models for under-served and under-resourced languages

Published: 30 June 2022

Aims

Training machine translation models on news output can help global media organisations utilise better MT solutions for low-resource languages.

What is GoURMET?

GoURMET stands for Global Under-Resourced MEdia Translation.

GoURMET

It is a 3.5-year-long multilingual, multinational project supported by European Union's Horizon 2020 programme to improve machine translation, particularly for less common languages.

The project serves a dual purpose: Training custom machine translation models on data from the news domain, and developing tools and ideas utilising these models to support journalists in multilingual newsrooms.

Machine translation for a global newsroom

Workflow model for GoURMET Consortium
Workflow model for GoURMET Consortium

Generally, the more data available to develop a machine translation (MT) model, the better the result. Typically, the models require millions of translated sentences in order to reach acceptable performance.

For several languages, including many of the 40+ languages BBC World Service reports in, compiling high quality training datasets is difficult. News content in these languages offers a valuable resource.

The project, run as part of News Labs' multilingual solutions stream brought global media giants BBC and Deutsche Welle together with academic trailblazers from Alicante, Amsterdam and Edinburgh Universities to explore how under-resourced languages can be better served by MT solutions in a media setting.

We selected 16 languages to be trained on news data from the BBC and Deutsche Welle.

These are: Amharic, Bulgarian, Burmese, Gujarati, Hausa, Igbo, Kyrgyz, Macedonian, Pashto, Serbian, Swahili, Tamil, Tigrinya, Turkish, Urdu, Yoruba.

Languages covered in GoURMET
Languages covered in GoURMET

Since they are trained on the news provider's output, these models are aligned to the organisation's narrative style. Being custom models, they are secure to process sensitive material, and can be enhanced over time. For processing large volumes, they also have potential to reduce costs.

News Labs senior software engineer demonstrating GoURMET.
News Labs senior software engineer Susie Coleman.

The project had identified three areas with potential benefits:

  1. Monitoring: Removing language barriers so that all content is visible across the newsroom in each language
  2. Content creation: Supporting the efficient transfer of content across languages via human validation and correction
  3. Domain enhancement: Experimenting with developing glossary-led solutions for fields with highly specialised terms.

To utilise these models in an efficient manner, and explore the extent of usefulness of MT solutions, News Labs created a multilingual suite of prototypes for BBC journalists. The suite was shortlisted for a News Innovation Award in 2021. It comprises of:

  1. Live Pages Monitor: A monitoring tool enabling BBC journalists to follow Live updates from any BBC Service in any language and immediately build on the local expertise.
  2. Frank: A discovery tool accumulating original and impactful content that can be reworked and reversioned for distribution across BBC outlets.
  3. Multilingual GST: A tool allowing under-resourced languages to benefit from machine learning solutions such as semi-automated graphics generation by employing MT models

Over 200 BBC and DW journalists contributed to the project as data validators and evaluators, with many more contributing to the tool trials.

Our work has demonstrated that it was possible to compete with and surpass the results from global tech giants even on single iterations of training.

The project ran between January 2019 and June 2022. The 42-month project also spawned 70+ academic research papers across the GoURMET Consortium.

What next?

The models developed in the project have been open sourced and are available to download on the project page. The prototypes and tools developed are available for use internally.

The project's EU reviewers recommended sustaining the line of research to explore further retraining options to ensure the models can improve over time.

News Labs continues to explore multilingual solutions in a bid to remove language barriers across BBC Newsrooms.

GoURMET project partners

More about this

Team

  • Andy Secker

    Andy Secker

    Former News Labs Language Technology Lead
  • Susie Coleman

    Susie Coleman

    Former News Labs Senior Software Engineer
  • Anna Błaziak

    Anna Błaziak

    Former News Labs Software Engineer
  • Sevi Sariisik Tokalac

    Sevi Sariisik Tokalac

    Senior Research and Development Producer
  • Martin Valchev

    Martin Valchev

    Former News Labs Junior Software Engineer
  • Rich Wareham

    Rich Wareham

    Former News Labs Senior Software Engineer
  • Lei He

    Lei He

    Former News Labs Senior Software Engineer

BBC News Labs

  • News

    Insights into our latest projects and ways of working
  • Projects

    We explore how new tools and formats affect how news is found and reported
  • About

    About BBC News Labs and how you can get involved
  • Follow us on X

    Formerly known as Twitter

Rebuild Page

The page will automatically reload. You may need to reload again if the build takes longer than expected.

Useful links

Theme toggler

Select a theme and theme mode and click "Load theme" to load in your theme combination.

Theme:
Theme Mode: