GoURMET - Global Under-Resourced MEdia Translation
GoURMET aims to create new machine translation technologies for translating between under-resourced languages and English
What is GoURMET?
Ever wondered how the machine translation you may have used recently became so accurate? The answer is AI trained with the huge number of identical sentences translated across the most popular languages.
But how can machine translation be built for languages without this wealth of training data? This is the challenge the GoURMET project is tackling. The BBC World Service reports in many languages for which gathering these training datasets is difficult, for example Kyrgyz.
GoURMET stands for "Global Under-Resourced MEdia Translation". This is an EU Horizon 2020 funded project with multiple academic and media partners around Europe.
GoURMET project abstract
As the BBC strives for the target of reaching a global audience of 500 million by 2022, growing the audience for the World Service across all 40 of our language services will be key. Bringing technologies such as Machine Translation into the newsroom supports this growth by allowing news and information to be more effectively shared between languages and freeing journalists from translation and re-versioning, allowing them more time to create the world class content which will drive growth.
Machine Translation also directly supports content creation for digital consumption as mobile, social media and suchlike become increasingly important mediums for access to news, especially in the developing world. There are also some tangible use cases for media monitoring, being able to monitor media in languages where it does not make business sense to employ a speaker of some small languages on a full-time basis.
As we try and bring language technologies into the newsroom we need to ensure that all languages across the World Service and Monitoring departments gain from this technological investment, this includes the smaller languages which are not well served by commercial Machine Translation technologies. Academic collaborations such as GoURMET allow us to help develop and experiment with these technologies.
The key objective of the GoURMET project is to develop reliable Machine Translation (MT) capabilities in under-resourced languages. The usefulness of MT is severely constrained by the lack of sufficient training data in most language pairs and domains. Typically, the models require many millions of translated sentences in order to reach acceptable performance. GoURMET aims to deliver machine translation technologies which are robust enough to be deployed where little or no translated training data is available. Examples of such languages of particular importance to the BBC World service include: Afaan Oromo and Tigrinya from East Africa; Igbo and Yoruba from Nigeria and a number of languages from India, most notably Gujarati and Punjabi.
Why are we doing this?
We think machine translation can help up answer the following questions:
- How can news stories be made discoverable across languages?
- How can it be assured that content (video, audio, text) created in one language is available to other language departments for re-use?
- How can the re-use of content and the efficiency of the news organisation be monitored?
- How can we develop reporting capabilities for new languages with limited resources?
The project will focus on three use cases:
- Global content creation — managing content creation in several languages efficiently by providing machine translations for correction by humans;
- Media monitoring for low-resource language pairs — tools to address the challenge of monitoring media in strategically important languages;
- International business news analysis—reliably translating and analysing news in the highly specialised financial domain.
GoURMET project partners
- Edinburgh University (lead partner)
- BBC News Labs
- Deutsche Welle
- University of Alicante
- University of Amsterdam