SUMMA - Scalable Understanding of Multilingual MediA

Status: active
Create a platform to automate the analysis of media streams across many languages.

What is SUMMA?

SUMMA stands for "Scalable Understanding of Multilingual MediA".

SUMMA is an EU Horizon 2020 funded Big Data project, submitted in April 2015 by BBC News Labs and partners.

It arose from project collaborations started at #newsHACK III, which was where we kicked off our work on Language Technology.

This Big Data project will leverage machines to do the heavy lifting in multilingual media monitoring, so that expert editorial staff can find what they need quickly, rather than wasting time sifting through mountains of media by hand.

SUMMA Project Partners

SUMMA Project Abstract

Media monitoring enables the global news media to be viewed in terms of emerging trends, people in the news, and the evolution of story-lines. The massive growth in the number of broadcast and Internet media channels means that current approaches can no longer cope with the scale of the problem.

The aim of SUMMA is to significantly improve media monitoring by creating a platform to automate the analysis of media streams across many languages, to aggregate and distill the content, to automatically create rich knowledge bases, and to provide visualisations to cope with this deluge of data.

SUMMA has six objectives:

  1. Development of a scalable and extensible media monitoring platform;
  2. Development of high-quality and richer tools for analysts and journalists;
  3. Extensible automated knowledge base construction;
  4. Multilingual and cross-lingual capabilities;
  5. Sustainable, maintainable platform and services;
  6. Dissemination and communication of project results to stakeholders and user group.

Achieving these aims will require advancing the state of the art in a number of technologies: multilingual stream processing including speech recognition, machine translation, and story identification; entity and relation extraction; natural language understanding including deep semantic parsing, summarisation, and sentiment detection; and rich visualisations based on multiple views and dealing with many data streams.

The project will focus on three use cases:

  1. External media monitoring - intelligent tools to address the dramatically increased scale of the global news monitoring problem;
  2. Internal media monitoring - managing content creation in several languages eciently by ensuring content created in one language is reusable by all other languages;
  3. Data journalism. The outputs of the project will be field-tested at partners BBC and DW, and the platform will be further validated through innovation intensives such as the BBC NewsHack.

More about this

If you want to find out more about SUMMA, tweet us at @bbc_news_labs

Next Priorities

  1. Prepare backlogs
  2. Work with consortium on project planning