Captions - Who's talking when, in Parliament

Status: closed
How might we automatically annotate and chapterise our Broadcast News via the act of captioning?

While the BBC are leaders in Broadcast News quality and craftsmanship, there are still challenges in workflows between digital and broadcast.

This project illuminates some opportunities in making our linear content more discoverable and reusable, by taking captions (aka “lower third”, or “astons”) and using them to mark chapter points.

How will we use caption data?

  1. Take captioning speakerID data and timing info.
  2. Match caption speakerIDs with BBC Linked Data uris
  3. Build a UX that allows the linear content to be slices and diced according to the speaker of interest
  4. Use ROT (Record of Transmission) as source video.

The starting page, with a list of captions chapters

This project extends from the News Slicer project. NewsSlicer uses MOS Running Order data via MOSART to help with content chapterisation. We also think there is value in looking to the manual act of captioning as a separate workflow and data source for chaterisation of our linear output.

In partnership with BBC R&D, we are looking at how we could use Captions output to train our SpeakerID dataset (see the Transcriptor project).

If you want to get in touch, send us a tweet @bbc_news_labs

Next Priorities

  1. Set up data feeds from our in-house captioning tools
  2. Link captioning data with BBC Linked Data sets
  3. Use caption timing to overlay data/chapter points on longform output