Captions - Who's talking when, in Parliament

Exploring the use of captioning workflows as a means to infuse our linear output with structured data.


How might we automatically annotate and chapterise our Broadcast News via the act of captioning?

While the BBC are leaders in Broadcast News quality and craftsmanship, there are still challenges in workflows between digital and broadcast.

This project illuminates some opportunities in making our linear content more discoverable and reusable, by taking captions (aka "lower third", or "astons") and using them to mark chapter points.

How will we use caption data?

  1. Take captioning speakerID data and timing info.
  2. Match caption speakerIDs with BBC Linked Data uris
  3. Build a UX that allows the linear content to be slices and diced according to the speaker of interest
  4. Use ROT (Record of Transmission) as source video.

The starting page, with a list of captions chapters

This project extends from the News Slicer project. NewsSlicer uses MOS Running Order data via MOSART to help with content chapterisation. We also think there is value in looking to the manual act of captioning as a separate workflow and data source for chapterisation of our linear output.

In partnership with BBC R&D, we are looking at how we could use Captions output to train our SpeakerID dataset (see the Transcriptor project).

If you want to get in touch, send us a tweet @BBC_News_Labs


  • The accuracy of the prototype was very impressive, but it was not considered a priority for transfer into production as a fully supported BBC tool.


Love data and code?

We'd like to hear from you.