Multilingual Article Tracker (Mat)

Matching translated news articles with the original English-language text.

Published: 2 February 2023

Aims

Match translated World Service news articles with the original English-language source content. This will allow us to measure the metrics of distribution and readership, and help inform editors’ commissioning decisions.

The problem

The BBC publishes news stories in more than 40 languages.

A central team called Digihub writes content in English for their colleagues in the language services to translate and publish to their respective audiences.

But currently there is no adequate automated way to monitor the stories the language services translate and publish.

In turn this makes it difficult to know which of the original stories appealed to diverse language services and their audiences.

This is important because the missing information would help editors decide what kind of stories their journalists should write.

Our solution

We designed a tool that would automatically ingest the English-language content from Digihub along with the published news articles from the BBC’s language services.

The tool would then compare the published, translated articles to the original English-language content to find similarities and match the source content with the translated text.

Graphic showing the flow of information

How we built Mat

We translated the published articles back into English using a BBC News Labs translation prototype, Frank, and the BBC's transcription tool Volt.

We used a pre-trained machine learning model to turn the text into a multi-dimensional dense vector space — essentially, a very long list of numbers.

We then used the k-nearest neighbors statistical algorithm to match the original content with the translated articles.

Outcome

We tested the Mat prototype with content from Thursday 26 January 2023.

The Digihub team emailed the BBC’s language services a story examining why Germany had delayed sending tanks to Ukraine, comprising analysis by the BBC correspondent Katya Adler.

We asked Mat to find the 10 nearest matches to this English-language content.

Mat correctly picked up that the content had been translated into Spanish, Vietnamese and Japanese.

Graphic showing the translated articles

It gave a confidence rating of more than 97% that these were the right match.

It also identified seven other stories, in Russian, Arabic, Korean, Japanese, Vietnamese, Spanish and Nepali, but gave a lower confidence rating. This indicated that these were very similar to the English-language original but may not be the right match.

It appeared that these seven articles were the news story about Germany sending tanks to Ukraine, but not the analysis piece that our Digihub colleagues shared, thus justifying the lower confidence rating.

Next steps

News Labs is currently considering whether to spend more time refining the prototype.

Results

In a test, Mat correctly identified three translated articles which matched the English-language content.

Team

  • Clare Spencer

    Clare Spencer

    Former research and development producer
  • Faith Ege

    Faith Ege

    Former News Labs Software Engineer
  • Tom Francis-Winnington

    Tom Francis-Winnington

    Senior Software Engineer
  • Jack McPoland

    Jack McPoland

    Software Engineer
  • Chris Nicholson

    Chris Nicholson

    Former News Labs Senior Software Engineer
  • Sarah Rainbow

    Sarah Rainbow

    Senior Software Engineer
  • Sevi Sariisik Tokalac

    Sevi Sariisik Tokalac

    Senior Research and Development Producer
  • Rich Wareham

    Rich Wareham

    Former News Labs Senior Software Engineer
  • Ben Nuttall

    Ben Nuttall

    Senior Software Engineer

BBC News Labs

  • News

    Insights into our latest projects and ways of working
  • Projects

    We explore how new tools and formats affect how news is found and reported
  • About

    About BBC News Labs and how you can get involved
  • Follow us on X

    Formerly known as Twitter

Search by Tag:

Rebuild Page

The page will automatically reload. You may need to reload again if the build takes longer than expected.

Useful links

Theme toggler

Select a theme and theme mode and click "Load theme" to load in your theme combination.

Theme:
Theme Mode: