Newshack Language Hack

#newsHack: Language Technology

This week saw teams from all over Europe and the UK in London’s Shoreditch to compete in the second annual Language Technology newsHack.

The hack focused on the following technologies:

  • Speech to text
  • Text to speech
  • Auto translation between languages
  • Speech synthesis
  • Entity extraction
  • Multilingual graphics

But teams were allowed to use any other tools they deemed necessary.

They were granted access to the BBC’s content API and NewsLabs’ Juicer to source content.

The categories for the competition were:

  • Best tool for multi-lingual journalists
  • Best audience facing tool
  • Surprise us!

In short here’s what the teams made:

Team 1, SUMMA

  • live video translation
  • Highlighted extracted entities to help explain when translation isn’t perfect

Team 2, Lancaster University 1

  • multilingual reality check
  • cross reference facts as they happen across languages

Team 3, Lancaster University A

  • Super Mega Linkatron 5000
  • one index for all 30 languages on BBC
  • re-imagining of BBC site to make all content across all languages

Team 4, University of Edinburgh 2

  • Czech it out
  • compare stories across languages via various APIs and extraction services
  • check for related content on Twitter

Team 5, BBC R&D and Nexa Center

  • finding quotes and attributing them in news articles
  • avoid the spurious results
  • Ramen - people in the documents
  • quotes in any language

Team 6, Southampton University

  • multilingual contextualiser of articles
  • match stories from concepts in one sentence

Team 7, University of Edinburgh 1

  • made a synth from bible readings in Swahili
  • very quick learning for translation and speaking languages
  • built a Machine Translation model for Swahili in 1 day to translate subtitles

Team 8, Queen Mary University

  • cross lingual entity linking

Team 9, BBC Wales/Cymru and Bangor University

  • learning subtitles showing difficult / rare words
  • language support for novice speakers and bilingual low-confidence speakers
  • ‘rare’ words identified by dictionary look up stats and frequency of use

Here’s a little photo log of the events as they happened..

Credit to Lancaster University’s Stephen Wattam for some of the photos

Robin Pembroke

BBC’s Director of Product & Systems, Robin Pembroke

Francesco Negri

News Labs’ Francesco Negri introducing our tech

Pensive

Hmm… so…

Workings

… £££!!!

Action

Attack…

Fuel

Can’t work all the time

More action

A haa, then we …

Rob Squires

News Labs’ Rob Squires giving a round up at the end of day 1

Round up stuff

More round up action

More round up stuff

So the idea is..

Presentations

Presentations!

And the winners…

Best audience facing tool

Team 9, Wales BBC Wales/Cymru and Bangor University

  • learning subtitles showing difficult / rare words
  • language support for novice speakers and bilingual low-confidence speakers
  • ‘rare’ words identified by dictionary look up stats and frequency of use
  • great practical BBC application Wales

and

Team 3, Lancaster University A

  • Super Mega Linkatron 5000
  • one index for all 30 languages on BBC
  • re-imagining of BBC site to make all content across all languages Lancaster

Best tool for multi-lingual journalists

Team 5, BBC R&D and Nexa Center

  • finding quotes and attributing them in news articles
  • avoid the spurious results
  • ramen - people in the documents, chris newell
  • quotes in any language, pulled from text, attributed to speakers automatically R&D

Surprise us

Team 7, University of Edinburgh 1

  • made a new voice synth from bible readings in Swahili
  • very quick learning for translation and speaking languages
  • built a Machine Translation model for Swahili in 1 day to translate subtitles Edinburgh

Categories:

Tags: