in use

SUMMA - Scalable Understanding of Multilingual MediA

Using a range of combined language technologies to support multilingual media monitoring.

Hypothesis

Create a platform to automate the analysis of media streams across many languages.

What is SUMMA?

SUMMA stands for "Scalable Understanding of Multilingual MediA". SUMMA was an EU Horizon 2020 funded Big Data project which ran from 2016 to 2019.

This large project with multiple partners pushed the boundaries of multilingual media monitoring, pulling together multiple language technologies to support monitoring of the world's media.

SUMMA received the highest performance rating from the project review panel in March 2019.

SUMMA project abstract

Media monitoring enables the global news media to be viewed in terms of emerging trends, people in the news, and the evolution of story-lines. The massive growth in the number of broadcast and Internet media channels means that current approaches can no longer cope with the scale of the problem.

The aim of SUMMA was to significantly improve media monitoring by creating a platform to automate the analysis of media streams across many languages, to aggregate and distill the content, to automatically create rich knowledge bases, and to provide visualisations to cope with this deluge of data.

SUMMA had six objectives:

  1. Development of a scalable and extensible media monitoring platform;
  2. Development of high-quality and richer tools for analysts and journalists;
  3. Extensible automated knowledge base construction;
  4. Multilingual and cross-lingual capabilities;
  5. Sustainable, maintainable platform and services;
  6. Dissemination and communication of project results to stakeholders and user group.

Achieving these aims advanced the state of the art in a number of technologies: multilingual stream processing including speech recognition, machine translation, and story identification; entity and relation extraction; natural language understanding including deep semantic parsing, summarisation, and sentiment detection; and rich visualisations based on multiple views and dealing with many data streams.

Experiments were also undertaken into automated fact checking.

The project's primary focus was on two types of media monitoring:

  1. External media monitoring - intelligent tools to address the dramatically increased scale of the global news monitoring problem;
  2. Internal media monitoring - managing content creation in several languages efficiently by ensuring content created in one language is reusable by all other languages;

Conclusions

SUMMA enabled News Labs and BBC Monitoring to gain a deeper understanding of how a range of language technologies may combine to support media monitoring. As part of the project, News Labs was able to create a number of prototypes that were tested by our editorial teams - these are now in various stages of transfer into production. SUMMA has open sourced much of the software and technology that was produced as part of the project, details on the SUMMA Blog.

SUMMA project partners

More about this

Careers

Love data and code?

We'd like to hear from you.