Frank - Building a lingua franca through machine translation

Published: 20 September 2021

Aims

We could boost BBC World Service commissioning and planning capabilities by having the entire range of the BBC’s content in 40+ languages in one place, discovering what moved a specific audience segment at a glance, and filtering down to how a topic has been covered across languages.

Outline

Guided by data, Frank provides an easy-to-navigate platform for Language Services to showcase powerful, distinct, original BBC journalism, to serve the audience's needs and interests, and to offer a starting point for onward reversioning journeys.

It removes language barriers between teams and offers a level playing field. As such, its mission involves:

Achieving full transparency to support extensive editorial oversight, compliance, and commissioning processes.
Identifying the best of the BBC's journalism, with robust and proven performance to amplify impact.
Facilitating efficiency via reversioning and validation workflows, to direct more resources to original journalism.
Providing better audience value by responding to underserved audience segments and user needs through affinity mapping.

Background

Frank is one of the three prototypes developed under the EU-backed GoURMET project for machine translation solutions, and was shortlisted for the BBC News Awards 2021 for Innovation.

The wider project explored where and how machine translation (MT) might ease media workflows. It employed commercial MT models alongside GoURMET models tailor-built and trained on news data from the BBC and Deutsche Welle.

Frank is named after the project team’s nickname for 'Lingua Franca', referring to English as the shared language among BBC journalists.

Several BBC World Service teams voiced an interest to identify "uniquely BBC" stories from around the world that could resonate with their respective audiences.

Frank focuses on facilitating discovery of such impactful stories that are not strictly time- or event-driven so that they would still be relevant by the time they are transferred across teams.

How it works

BBC World Service is estimated to produce around 2400 articles per week. After filtering out day-to-day updates, about a quarter of these (~600 articles) are showcased in Frank, creating a level playing field for further analysis and utilisation.

For instance, in 2022 a story about a father's quest to find his daughter's murderer after a 26-year search was first published by the Portuguese (Brasil) team. This was then (manually) picked up and re-versioned by the Spanish speaking (Mundo) team, where it attracted five times more page views. In both cases, users have stayed on the page for more than two minutes, reading the whole article until the end.

This suggests that enabling teams to identify and reuse relevant content from closely affiliated teams could offer a valuable investment for both parties by generating secondary audiences much larger than the intended primary audience.

The tool also empowers journalists who may speak multiple, potentially closely-related languages (e.g. Serbian/Russian, Urdu/Arabic, Pashto/Persian etc) to compare translations and arrive at the best possible outcome by comparing the source and its English translation with the target translation.

Journalists who tried out the tool have said that having machine translations as a starting point improved their turnaround times.

Corrections are stored in the system, so it potentially enables originators of an article to offer a 'validated' translation in English to serve as a master for further translations downstream.

One proposal for a further iteration is to help journalists to validate and promote translations proactively and reactively (on request from a peer).

Further investment in Frank could facilitate exploring solutions for aligning and clustering various versions of the same story, tracking pathways of content journeys, and notifying journalists of any changes to the parent content.

Frank was launched in August 2021 and was piloted by journalists from the Arabic, Chinese, Hausa, Igbo, Russian, Serbian, Tamil, Turkish, and central distribution (DigiHub) teams. The users have developed habitual use following the pilots.

The trial aimed to separate the experience of the tool and discovery processes from the quality of translations. The feedback was positive. Every participant said they would recommend the tool to a colleague.