17 December 2021

Student challenge report: Imperial College London and the University of Oxford take on major questions facing the news industry

Student challenges are a way for us at Labs to engage with the talented early innovators at universities. We try to bring academic work closer to real world problems in the news industry with a collaborative approach, whilst having some fun along the way.

Labber Lei He, together with Allison Shultes and Conor Molumby, led the latest challenge with PhD students from Imperial College and the University of Oxford, at the EPSRC Centre for Doctoral Training (CDT) in the Mathematics of Random Systems: Analysis, Modelling and Simulation.

The challenge we set

To inspire the students, we tasked them with trying to create innovative solutions to help us serve the BBC's audiences:

  • How might we personalise user experiences while meeting the BBC’s public service duties?
  • How might we increase trust in the BBC?
  • How might we demonstrate our commitment to impartiality?
  • How might we give our journalists insights into how our audiences are thinking?

These questions get to the heart of the problems we are trying to solve in News Labs. We gave the students an overview of our recent prototypes that address some of these challenges and talked them through how we form solutions to experiment on.

Then it was over to the two teams of students. They had two weeks to form ideas and prototype their solutions. Here's what each team came up with...

One image from many

Inspired by News Labs’ Graphical Storytelling project, the first team tried to address the challenge of impartiality in news by creating a single visual representation of a news story.

Could this be used as a quick visual summary of a story for the sections of the audience that prefer images over text? Could the creation of this new image from two or more images for a story - initially selected by a journalist - be automated?

To bring the idea to life the students took inspiration from their studies of the theory of optimal transport of probability measures to merge multiple images.

A graduated represenation of the coronavirus spike protein and a vaccine syringe merging to form a single image. An illustration of the breakdown of the BBC's blocks logo into its constiuent colours .

Experiments included merging two images to form a new one, and breaking down a single image into its constituent colours.

Their goal is to interpolate between probability distributions in an optimal way: continuously and along the shortest path, and included a colour transfer algorithm to apply the motif of one image to the objects of another. [For those interested in how it worked under the hood, check out the Python optimal transport 2 neural-enhance library.

Making BBC stories more accessible to non-native English speakers

Inspired by News Labs’ BeX project, a second team looked at tackling the challenge of improving the readability of domestic news stories for non-native English speakers.

7% of the UK population are non-native speakers (2011 Census). The BBC reaches 468.2m people worldwide weekly, with 42 language services and English language output.

The team decided to tackle this challenge by offering insights to journalists and editors in the hope that their solution could inform editing choices. This was based on a score system for readability as the story is drafted, which is similar to the BeX project and other readability solutions.

The team’s unique leap was in identifying the hypothesis that a readability score for native speakers is not the same as that for non-native speakers. This was based on their analysis of the characteristics of the language comprehension problem for non-native speakers, including:

  • Word ambiguity - for example, “the committee chair sat in the centre chair.”
  • Word length - are there more appropriate measures of difficulty? For example a loan word like “doppelgänger” is long, but understandable by a German speaker.

The score is calculated by using back translation. This involves automatically translating the original text to another language, and then translating the result back to the source language again. By comparing the original text and the output of this process, the team calculated a score which is an estimation of the readability of the text for non-native speakers.

A process flow diagram showing the translation of a piece of text to another language and then back to English again using a machine translation model.

The process assumes the first machine translation from English to another language represents a non-native English speaker's understanding of the news story.

The solution assumes that the understanding of a non-native speaker is simulated by the first machine translation in this process. The solution also assumes that a large number of differences in the original text and output text point to ambiguities that make the story harder to understand for non-native English speakers.

The team utilised their data science skills to test three different algorithms that could calculate the similarity between each input and output story from the process. The team used an open-source dataset of English stories from the Guardian and also calculated an average similarity score for the entire dataset when they were translated to a number of languages.

A table showing the average similarity score for a range of languages including the top scoring: Welsh, German and Spanish.

A score closer to one indicates a high level of similarity and thus readability for a non-native English speaker fluent in the listed language. The column names are the text similarity analysis algorithms used to compare the original and twice translated English texts.

And the winner is…

"Mathematicians at Imperial engage widely and actively with society and with industry. By working with the BBC on this project, our CDT students gained valuable experience which will support their own development into mature, engaged researchers," said Dr Thomas Cass, Co-Director of the CDT in the Mathematics of Random Systems at Imperial College London.

The News Labs team members were left with the unenviable task of picking a winning idea from the above. In typical Labs fashion we broke this problem down by setting criteria:

  1. How innovative is the idea?
  2. The approach to solving the problem identified, including the creativity and originality of the solution
  3. The potential applicability of the solution to the BBC and wider industry

Overall, we were really impressed with the work of both teams and we thought there was very little to separate them, but in the end team two’s back translation solution pipped the victory.

We were particularly delighted by the creativity shown by both teams.

What’s next?

Part of News Labs’ mission is to build partnerships with academic, not-for-profit and commercial bodies around the world to collaborate on innovation.

We plan to run more University Challenge next year with the aim of bringing computer science and journalism students together to come up with innovative ideas for real world problems in news media.


Love data and code?

We'd like to hear from you.