News Labs’ AI Researcher Fionntán O’Donnell and Software Engineer Remi Oduyemi built a prototype to highlight and identify the most important people in raw video footage. Here’s what they learned during the process.
This is the second article in a series exploring facial recognition. For more background on how facial recognition works, check out A Brief Guide to BBC R&D’s Video Face Recognition.
Part of our jobs in News Labs is to reimagine how the technology coming out of BBC Research and Development might be used by journalists in the newsroom. One of R&D’s projects that we’ve been especially interested in exploring is an in-house facial recognition tool called FaceRec, developed by the Internet Research and Future Services’ Data team.
We wondered: could this technology help editors and producers based in our newsrooms?
To test it out, we designed a web-based prototype to help highlight and identify the most important people in raw video footage. We also did some user research with teams in the newsroom, which helped us identify new use cases for image recognition in future projects. Here’s a quick look at our design and ideas we have for how the technology might better serve our users.
The use case: understanding how newsroom production teams work
There are many ways we could display the output from FaceRec. However, before deciding on a concrete design, it was crucial that we understood video journalists’ current workflows and their thoughts on how the technology might be useful. So we did some good old-fashioned user research — finding potential users, talking to them about their workflows, collecting the observations and reporting these back to our team.
When creating packages for the news, producers and video editors often think in terms of shots, which they search for within our media asset management systems. A lot of these are typical News shots, such as:
- Buildings (think Canary Wharf shots for news about the economy and financial firms)
- People shaking hands (politicians meeting)
- People walking (to allow voiceover introducing an interviewee)
- Two/three shots — shots of two famous or newsworthy people together (think of the well-used video of Donald Trump walking with Theresa May). Take the following BBC piece on expansion of high rises as an example. If you skip around within the video, you’ll see many different types of shots here: interviews, archive BBC footage, establishing shots, private companies’ promo videos and so on.
The prototype: what we built
Our prototype is made up of individual components designed to ease the job of video editing. Raw video footage, such as rushes shot in the field, are processed by FaceRec, and the data is then used by our tool to help highlight the most important people in the video and suggest who they might be.
A screenshot of our full prototype
The components include:
- a personal jobs list of processed videos
- a bounding box overlay on faces detected during video playback
- an interactive histogram displaying a density gradient of detected faces
- an interactive timeline displaying the name of a detected face if known
- thumbnails, tracks and a sub clip of each person detected in the video Let’s have a look at each one in detail.
Editors can easily monitor the real-time status of uploaded videos using a jobs list. Once a file is uploaded, editors can expand their selected footage to reveal the video player and other face recognition tools beneath.
We gave each job the same title as the one that appears in our video asset management system so that editors can quickly spot their completed video packages when browsing.
Bounding box overlay
The data from FaceRec contains coordinates that pinpoint the location of each face it detects on a frame. We used these coordinates to add a bounding box to processed shots. This allows editors to more easily track detected faces during video playback.
The bars of the histogram represent the number of people in each video frame. It’s interactive, allowing journalists to navigate to different points in the video by selecting points on the chart. This allows editors to easily skim to the points where they suspect that the shots they’re looking for may be, based on the size of the crowd in the frames.
For example, if an editor is looking for a shot containing the entirety of the Royal Family, they could easily navigate to the time codes where multiple faces are detected by FaceRec. Editors could also use the histogram feature to find landscape or scenery shots where no people are present, e.g. a backdrop shot of The Palace of Westminster before the members of parliament enter.
The Data Team built an interactive timeline into their original face recognition prototype using the open-source visjs library, which we repurposed for our tool. The timeline displays the names of identified faces and labels those it doesn’t recognise as ‘Unknown’. Users can skip to any point — or detected face — in the video, making it easier to find shots containing specific people.
The tracks viewport is composed of three collapsible panels:
- people panel
- tracks panel
- clip panel
The tracks viewport
The people panel displays thumbnails of all the people, known or unknown, detected in the video. Editors can select one or more people to reveal the frames in which they appear. These tracks, displayed in the tracks panel, act as cue points that allow the user to jump to a specific time in the video where the selected face appears.
A sub-clip is also displayed, which shows only footage of the selected person in the clip panel. We thought editors might find this useful when making commemorative videos, e.g. when honouring a public figure’s work.
Now that we have our demo, our next steps will be for further discussions within the BBC on the usefulness of face recognition to the business, based on our prototype and user research.
During our six-week investigation, we identified some limitations with the underlying technology itself, including:
- Reliability. FaceRec currently isn’t accurate enough to automatically tag broadcast content.
- Detection. FaceRec still struggles to recognise people’s profiles in shots.
- Predictability. It isn’t clear why FaceRec is able to label some faces accurately and not others.
- Maturity. It doesn’t feel ready, which may indicate that users won’t feel confident using the system. (However, part of this investigation was to see if R&D should continue to work on improving FaceRec)
- Product transfer. It’s not clear to us what internal system will eventually inherit and support a facial recognition tool.
We also found some additional use cases for object and face recognition based on our user research. These could potentially lead to future projects for our team and BBC Research and Development:
Automated dope sheets
Dope sheets is an industry term for video metadata files that contain a shot-by-shot description of a video. These usually include the type of shot (e.g. close-up, medium-shot), time stamps and who or what is in the counting shot.
News agencies usually provide producers with a detailed dope sheet, but often our journalists can’t due to time pressures. Using face recognition and other image analysis tools, we could attempt to automatically create a basic dope sheet for all incoming video feeds, which would help editors and producers search for the shots they need.
Where’s my interviewee?
At a busy event with a lot of people, it’s often hard to pick out the exact person the journalist wants to quote or feature in a range of different shots. Face recognition could help speed up that process.
From Recognition to Suggestion
A repeated concern among producers was that FaceRec might misidentify people. It would simply be unacceptable if the BBC were to show an interview of Theresa May and had her labelled onscreen as Angela Merkel. Because we’re unable to guarantee a 100% accuracy rate, it seems unlikely we can use face recognition in the unsupervised tagging of faces for broadcast.
However, what we could offer is automatic face suggestion — perhaps within a tool that suggests the people in a keyframe to someone adding metadata to the video. This could work with an opt-in click so they can quickly tag many more shots, instead of typing all names in. A prototype like this would balance the push for more content metadata with the security of a human-in-the-loop AI tool.
As most developers will tell you, producing a demo for technology is very different than getting it to work in production at scale. While we’re very proud of the demo we created in six weeks, we’re also fully aware of the challenges in getting face recognition deployed — gathering accurate tagged images of people, retraining of models, handling thousands of videos a day and simply having the budget and people necessary for long-term development. As such, an important part of our work is talking to teams who could take over development of FaceRec.
However, the challenges that come from navigating between new ideas, users, great tools and software development is what News Labs is all about. We look forward to figuring out the next steps in where face recognition and indeed other AI techniques can help the BBC.