We kicked off February by hosting a Transcriptor #newsHACK to explore the use of machine-assisted transcription technologies with BBC Connected Studio. Eleven teams of journalists and developers from across the UK and Europe participated. Here's what they built, what we learnt and where we hope to go from here.
Machine-assisted transcription has the potential to be of huge benefit for news organisations and for people who want to understand the news. We were looking for prototypes that focused on either audience-facing formats or journalists' workflows.
Teams were invited to use their own speech-to-text service or the resources we provided during the two-day event. We've been working on developing a new data format for transcripts using our in-house speech-to-text engine, which we made available for the hack.
Transcripts for the hack were generated using the BBC's version of Kaldi, an open-source speech-to-text engine. We've been working on a new transcript model as part of our work around automated transcription and media enrichment. The final format for the transcripts are JSON files that contain:
- An array of speakers. Speakers are later matched to transcript segments by their index position. Speakers' name property is set to null by default.
- An array of segment objects. Each segment includes:
- An array of word objects with timecoded start and end times, as well as a text property with the identified word.
- A speaker id property associated with the segment. The id corresponds to the opening array.
To learn more about our work in this area, see the page for our OCTO project. To take a look at our transcript model, send us an email at newslabs[at]bbc.co.uk.
The categories for the competition were:
- Most irresistible transcript-assisted production tools for a/v editing
- Most compelling new format
- "Surprise Us"
Most irresistible transcript-assisted production tool: the Telegraph
Following one of the most intensely debated issues in the industry — what to do about "fake news" — the team from the Telegraph prototyped an Android app that fact-checks live speech using a combination of speech-to-text transcription and entity extraction. Concepts are identified in near real-time, and the user is directed to Wikipedia articles in order to verify claims from the speaker.
The team said that future iterations of the prototype could potentially scrape Wikipedia pages for more relevant facts, rather than just pointing journalists to the appropriate entry.
Most compelling new format: BBC News Labs
Four developers from our team and a fifth member from the BBC Broadcast Systems and Development (BSD) won the category with a tool for indexing videos based on their transcripts. The problem of video discovery affects both journalists, who might want to extract a certain segment for inclusion in a broadcast or online video, as well as audiences who want to search a news segment without knowing what the programme is called.
The team's BBC Live Search prototype transcribes live broadcast content and and makes it searchable by its transcript. That allows the audience to search by simply remembering a word or phrase from the programme. In addition to finding clips, it also returns video at the playback location of the searched-for phrase.
"Surprise Us": Ericsson and LiveWyer
This combo team won the "surprise us" category with a tool that performs live video translations. Not strictly speaking dealing with transcripts, the tool takes subtitles from live broadcasts and generates a voiced over translation using Google Translate and Amazon Polly. While the original subtitles aren't translated, the audio can be generated in one of fourteen different languages.
The tool could be further developed by automatically changing speaker voices based on subtitle colour, or else using automatic speech recognition to assign speaker gender without editorial input. The judges said they were particularly won over by implications for speakers of minority languages, who might otherwise be underserved when it comes to news content.
The winning teams from the Telegraph, Ericsson and LiveWyer, and BBC News Labs and Broadcasting Systems.
Full List of Prototypes
BBC Rewind: Responsive subtitles and video files
The team from BBC Rewind built a two-tier tool for automatically cropping and subtitling videos using only one media file and one .srt file (the standard format for video subtitles).
The team's Subtitler interface allowed journalists to set start and end timecodes for different subtitle segments. Their Framer tool allows journalists to export videos in different formats — vertical or square for social media sharing, for example — using the idea of "objects". Future iterations of the toolkit would be able to automatically detect where objects appear in a shot, and therefore know where it's safe to place subtitles depending on the selected crop.
BBC Monitoring: Adding context to audio
This team's tool used the idea of topic-based navigation of video content to add context to media assets. After performing entity extraction on generated transcripts, the tool presents users with a list of tags that they can visualise across the duration of a video with a histogram beneath the media player. Related content is also surfaced, which can then be explored as companion files to the original video.
BBC BSD: WordWrecker for metadata discovery
Like BBC Monitoring, the team from BBC Broadcast Systems Development built a prototype using visuals to represent the content of a video file. They chose to build a segmented timeline, where different segments are represented by the most prevalent words or themes in the timeframe. As the video plays, a wordcloud is automatically generated which updates word size based on the number of utterances across the whole clip. Clicking a word produces a list of start-and-stop times where that word appears in other places across the video file.
BBC Research and Development: NewsPump, a companion tool to broadcast content
NewsPump produces a live feed of online content on the same subject as news that's being broadcast. The idea behind the companion tool is that viewers might not always be in a position to listen to audio during live broadcasts — and even when subtitles are available, they're written to be "heard" along with spoken speech rather than written for an online audience.
Deutsche Welle: Auto-translated transcripts and subtitles
The team from Deutsche Welle created a tool that automatically transcribes and translates video content from one language to another. The text editor allows for manual parsing of the transcript to create subtitles for the new video file, and it also uses its transcription service's confidence scoring to highlight areas of the transcript that journalists might need to manually correct.
Trint: Visualising sentiment
Trint's tool visualises sentiment across a video using stream and spider charts. The team demoed two different iterations: one in which spider and stream charts animate alongside played video content, and a second that annotates the transcript itself with colours corresponding to sentiment, in addition to the graphs.
VRT: Tappable video tool for exploring content
With Snapchat-esque interactvity, VRT's tool lets users swipe and tap through video clips to explore content. Users can swipe up to access an auto-generated transcript, tap to navigate to the next segment within the clip, and swipe sideways to find related videos. They can also choose to share a sub-clip from the video file: the tool syncs up with Adobe AfterEffects to add text overlays featuring a selected section of the transcript.
Edinburgh University: EditingSweet for smarter transcription algorithms
Edinburgh University's EditingSweet tool helps journalists train its speech-to-text service to produce better transcriptions. Using machine learning, it auto-corrects transcription errors pointed out by journalists in real-time — meaning that if a new place or person crops up in a breaking news story, all journalists using the tool immediately have access to the improved transcription model. It can also update its models based on the topic of a news story. As with Deutsche Welle's tool, EditingSweet highlights words in the editor that the transcription service returns a lower confidence score for, so journalists know where to review first.
We hope to continue the conversation around applications for machine-assisted transcription technologies at a follow-up #newsHACK later this spring. We're also interested in creating an online community around transcription for interested industry members.