Can we help journalists quickly create summaries, captioned image galleries and other story modes using automation?
BBC News needs to reach all audiences in the UK as part of its public service remit. The existing text article format doesn't appeal to everyone. Can we use semi-automation to create bullet point summaries, captioned image galleries and even videos that offer different ways to consume the same story?
Since the BBC News website went live in the 1990s the fundamentals of the news story format have not changed. Paragraphs of text, with a similar structure to newspaper stories, accompanied by a couple of pictures have been the default components of news webpages.
In 2020 and beyond should we be offering alternative ways to consume stories and if so, can the latest technology help us efficiently create these story modes?
As a first step in experimenting in this space we explored creating bullet point summaries and image galleries with captions, building on the knowledge we gained on a previous project Graphical Storytelling.
Are open source tools and models good enough?
We started by researching what's already out there for creating automatic summaries. Open source machine learning models fall into two categories, abstractive tools that process the text given to them and write a new summary, and extractive models which pick key sentences from the text - as opposed to writing anything new.
We've tested the models within News Labs and with journalists that write regularly for the News website and have combined them for our prototype. The summaries are created using an abstractive model called Pegasus, while the image captions are extracted from the story text using a tool called Bert.
By processing the extracted summary sentences that make up the captions of the image gallery with a tool from our colleagues in BBC R&D called Starfruit, we are able to identify relevant entities including names, places and topics in each sentence. These entities are used to query the image library from which journalists in the newsroom currently manually pick pictures for their stories.
The user interface of the prototype allows journalists to see alternative images and to filter based on entities, as well as offering custom search if nothing suitable appears from the automated search results.
We are looking at continuing this experiment with pilot users in the newsroom and other teams at the BBC.
Tweet us @BBCNewsLabs if you want to find out more.
- For the summary mode, initial feedback was mixed. Some journalists found the machine-generated summaries useful as a starting point to then sub-edit, whilst others felt it would be more efficient to write a summary from scratch.
- Image galleries are now in the early stages of internal testing.