SUMMA - Scalable Understanding of Multilingual MediA

Using a range of combined language technologies to support multilingual media monitoring.

Published: 12 April 2017

Aims

Create a platform to automate the analysis of media streams across many languages.

What is SUMMA?

SUMMA stands for "Scalable Understanding of Multilingual MediA". SUMMA was an EU Horizon 2020 funded Big Data project which ran from 2016 to 2019.

This large project with multiple partners pushed the boundaries of multilingual media monitoring, pulling together multiple language technologies to support monitoring of the world's media.

SUMMA received the highest performance rating from the project review panel in March 2019.

SUMMA project abstract

Media monitoring enables the global news media to be viewed in terms of emerging trends, people in the news, and the evolution of story-lines. The massive growth in the number of broadcast and Internet media channels means that current approaches can no longer cope with the scale of the problem.

The aim of SUMMA was to significantly improve media monitoring by creating a platform to automate the analysis of media streams across many languages, to aggregate and distill the content, to automatically create rich knowledge bases, and to provide visualisations to cope with this deluge of data.

SUMMA had six objectives:

  1. Development of a scalable and extensible media monitoring platform;
  2. Development of high-quality and richer tools for analysts and journalists;
  3. Extensible automated knowledge base construction;
  4. Multilingual and cross-lingual capabilities;
  5. Sustainable, maintainable platform and services;
  6. Dissemination and communication of project results to stakeholders and user group.

Achieving these aims advanced the state of the art in a number of technologies: multilingual stream processing including speech recognition, machine translation, and story identification; entity and relation extraction; natural language understanding including deep semantic parsing, summarisation, and sentiment detection; and rich visualisations based on multiple views and dealing with many data streams.

Experiments were also undertaken into automated fact checking.

The project's primary focus was on two types of media monitoring:

  1. External media monitoring - intelligent tools to address the dramatically increased scale of the global news monitoring problem;
  2. Internal media monitoring - managing content creation in several languages efficiently by ensuring content created in one language is reusable by all other languages;

Conclusions

SUMMA enabled News Labs and BBC Monitoring to gain a deeper understanding of how a range of language technologies may combine to support media monitoring. As part of the project, News Labs was able to create a number of prototypes that were tested by our editorial teams - these are now in various stages of transfer into production. SUMMA has open sourced much of the software and technology that was produced as part of the project, details on the SUMMA Blog.

SUMMA project partners

More about this

Team

  • Andy Secker

    Andy Secker

    Former News Labs Language Technology Lead

BBC News Labs

  • News

    Insights into our latest projects and ways of working
  • Projects

    We explore how new tools and formats affect how news is found and reported
  • About

    About BBC News Labs and how you can get involved
  • Follow us on X

    Formerly known as Twitter

Rebuild Page

The page will automatically reload. You may need to reload again if the build takes longer than expected.

Useful links

Theme toggler

Select a theme and theme mode and click "Load theme" to load in your theme combination.

Theme:
Theme Mode: